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INTRODUCTION 

Antimicrobial  resistance  in  bacterial  pathogens  is  steadily  increasing  and  recognized  as  one  of  the  greatest 
threats  to  global  public  health.  This  is  of  particular  concern  to  the  Military  Health  System  since  recent  wars  in 
Afghanistan  and  Iraq  resulted  in  a  major  spike  in  the  number  of  wound  and  healthcare-acquired  infections  by 
bacteria  resistant  to  three  or  more  antibiotics  (i.e.,  multi-drug  resistant  organisms,  or  MDROs).  This  study  uses 
advanced  genomic  and  bioinformatics  approaches  coupled  to  cell  culturing  to  identify  and  characterize  factors 
that  influence  virulence  (including  antibiotic  resistance,  and  biofilm  formation)  in  multidrug-resistant  organisms 
(MDROs).  This  objective  will  be  accomplished  through  experiments  focusing  on  the  regulatory  responses  of 
both  target  and  challenger  organisms  in  bacterial-bacterial  and  bacterial-eukaryotic  interactions  likely  to  occur 
in  the  wound  environment.  Deep  next-generation  cDNA  sequencing  is  being  used  as  a  tool  to  dissect  the 
regulatory  networks  altered  during  these  organismal-level  confrontations.  Cell-cell  challenges  include:  MDROs 
versus  human  microbiome  organisms,  MDROs  versus  other  MDROs  and  MDROs  versus  the  human  host  via 
cell  culture.  It  is  anticipated  that  there  will  be  specific  interacting  partners  that  have  a  negative  effect  on  MDRO 
virulence,  while  other  interactors  will  induce  the  expression  of  genes  involved  in  virulence.  This  project  is  being 
carried  out  in  collaboration  with  the  Walter  Reed  Army  Institute  of  Research  Multidrug-resistant  Organism 
Repository  and  Surveillance  Network  (WRAIR  MRSN),  who  have  provided  isolates  of  highest  clinical 
importance  collected  from  military  health  care  facilities.  The  research  outcome  may  lead  to  novel  treatment  of 
combat  wound  infections  in  the  future. 

BODY 

We  have  fulfilled  the  deliverables  as  outlined  in  the  proposed  scope  of  work  by  1)  working  closely  with  MRSN 
to  establish  the  list  of  25  MDRO  isolates  for  genome  sequencing;  2)  sequencing  the  genomes  of  these  isolates; 

3)  incorporating  genomic  sequence  data  and  optical  maps  provided  by  MRSN  in  addition  to  our  sequence  data 
to  generate  the  best  possible  assemblies  (i.e.,  contiguous  pieces  of  genomic  sequence)  and  making  the  data 
publicly  available;  4)  selecting  and  obtaining  three  representative  MDROs  and  three  commensal  bacterial 
isolates  to  be  used  in  confrontation  assays;  5)  conducting  bacteria: bacteria  confrontation  assays  and  analyzing 
the  RNA-Seq  results;  6)  conducting  MDRO:human  dermal  fibroblast  (HDF)  co-culture  confrontation  assays 
and  analyzing  the  RNA-Seq  results;  and  7)  submission  of  all  raw  RNA-Seq  sequence  reads  to  NCBI  SRA 
public  repository.  The  fulfillment  of  these  deliverables  resulted  in  the  development  of  new  bioinformatics  tools 
that  were  applied  to  understand  the  diversity  of  Acinetobacter  baumannii  isolated  from  the  military  health 
system  and  the  evolution  of  antimicrobial  resistance  of  A.  baumannii  as  presented  in  a  publication  in  Genome 
Biology  [l](Appendix  I). 

Aim  1.  Months  1-6.  Sequence  characterization  of  25  MDROs  from  WRAIR  MRSN 

Task  la.  MDRO  bacterial  isolate  selection.  Select  25  high  priority  clinically  relevant  MDRO  isolates  for 
genome  sequencing.  Selection  will  be  based  on  recommendations  by  the  WRAIR  MRSN.  Criteria  include  drug 
resistance  profiles,  prevalence  in  the  military  environment  of  care  and/or  wound  infections,  and  novel 
phylogenetic  patterns.  WRAIR  MRSN  will  provide  to  JCVI  purified  bacterial  genomic  DNA  for  whole  genome 
sequencing  for  this  task,  and  bacterial  stock  culture  for  co-culture  assays  in  Tasks  2  and  3.  [Task 
COMPLETED] 

The  MRSN  specifically  collects,  characterizes  and  archives  unique  strains  in  order  to  avoid  overlap  or 
redundancy  with  other  agencies  such  as  the  CDC,  or  SENTRY  of  JMI  labs.  From  the  MRSN  collection,  the 
strains  that  have  been  selected  for  this  study  have  the  highest  clinical  and  military  relevance  as  determined  by 
active  duty  military  physicians,  infection  preventionists,  and  microbiologists.  Specifically,  strains  were  selected 
on  the  basis  of  one  or  more  or  all  of  the  following  characteristics:  1)  they  were  isolated  in  military  medical 
treatment  facilities  (MTFs)  in  Afghanistan  and  Iraq,  2)  they  were  obtained  from  fatal  infections,  3)  they  were 
involved  in  serious  outbreaks,  4)  they  were  collected  from  wound  infections,  5)  they  were  paired  isolates 
obtained  from  both  wound  and  bloodstream  or  respiratory  sites  within  the  same  patient,  6)  and  they  were  either 
extremely  or  pan-drug  resistant  (XDR/PDR),  and/or  they  were  resistant  to  the  agent  of 'last  hope'  for  PDR 
infections.  MTFs  of  origin  include  Combat  Health  Supports  (CHSs)  or  Coalition  Force  Hospitals  (CFHs)  in 
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Afghanistan  (AFG)  and  Iraq  (IRQ);  Level  or  Role  IV  referral  centers  in  Europe,  and  Level  or  Role  V 
MEDCENS  in  the  Continental  United  States  (CONUS).  Genomic  DNA  samples  of  acceptable  quantity  and 
quality  for  all  25  MDROs  were  received  by  JCVI  on  December  18,  2012.  The  list  of  isolates  as  well  as  database 
accession  numbers  is  listed  in  Table  1. 


Table  1.  List  of  High-Priority  MDRO  Isolates  Sequenced  in  this  Study 


# 

Sample  ID 

MTF 

SITES 

ISOLATlOf 

DATE 

BIOPROJEC 

BIOSAMPLE 

WGS  Master 

ORGANISM 

SOURCE 

Comment 

1 

MRSN  3405 

DC-Baltimore  metro#2 

LVL  5 

3/29/12 

PRJNA223610 

SAMN02906922 

JPIA00000000 

Acinetobacter  baumannii 

WOUND 

co-s 

2 

MRSN  3527 

DC-Baltimore  metro#2 

LVL  5 

Apr-1 1 

PRJNA22361 1 

SAMN02906923 

JPHZ00000000 

Acinetobacter  baumannii 

WOUND 

co-s 

3 

MRSN  3942 

DC-Baltimore  metro#2 

LVL  5 

Jul-11 

PRJNA223612 

SAMN02906926 

JPHY00000000 

Acinetobacter  baumannii 

WOUND 

CO-R 

4 

MRSN  4106 

DC-Baltimore  metro#2 

LVL  5 

Jul-11 

PRJNA223613 

SAMN02906927 

JPHX00000000 

Acinetobacter  baumannii 

WOUND 

CO-R 

5 

MRSN  58 

DC-Baltimore  metro 

LVL  5 

Jun-10 

PRJNA223614 

SAMN02906928 

JPHW00000000 

Acinetobacter  baumannii 

WOUND 

6 

MRSN  7339 

DC-Baltimore  metro 

LVL  5 

Sep-04 

PRJNA223615 

SAMN02906929 

JPHV00000000 

Acinetobacter  baumannii 

WOUND 

respiratory-wound  pair  (MRSN  7341)  from  same  patient 

7 

MRSN  7341 

DC-Baltimore  metro 

LVL  5 

Sep-04 

PRJNA223616 

SAMN02906930 

JPIB00000000 

Acinetobacter  baumannii 

RESPIRATORY 

respiratory-wound  pair  (MRSN  7339)  from  same  patient 

8 

MRSN  11489 

SWUS 

LVL  5 

8/30/12 

PRJNA223617 

SAMN031 60298 

JTEP00000000 

Enterobacter  hormaechei 

WOUND 

KPC+ 

9 

MRSN  11248 

SWUS 

LVL  5 

8/6/12 

PRJNA223618 

SAMN03486506 

LBLX00000000 

Enterobacter  hormaechei 

WOUND 

10 

MRSN  10204 

SW  US/San  Antontio 

LVL  5 

7/18/12 

PRJNA223619 

SAMN03486507 

LBLW00000000 

Escherichia  coli 

WOUND 

KPC+ 

11 

MRSN  11639 

DC-Baltimore  metro#3 

LVL  5 

7/14/12 

PRJNA223620 

SAMN03486508 

LBIN00000000 

Enterococcus 

WOUND 

12 

MRSN  3418 

DC-Baltimore  metro#2 

LVL  5 

4/29/1 1 

PRJNA223621 

SAMN03486509 

LBIM00000000 

Enterococcus 

WOUND 

Vancomycin-resistarffn/erococcu^VRE) 

13 

MRSN  4777 

DC-Baltimore  metro#2 

LVL  5 

10/9/11 

PRJNA223622 

SAMN03486510 

LBIL00000000 

Enterococcus  faecium 

WOUND 

Sent  as  ACB  but  confirmed/as/aec/um 

14 

MRSN  12230  (HE12) 

Central  America/Honduras 

LVL  3 

7/19/12 

PRJNA223623 

SAMN0348651 1 

LBLV00000000 

Klebsiella  pneumoniae 

WOUND 

15 

MRSN  1319 

DC-Baltimore  metro 

LVL  5 

2/11/09 

PRJNA223624 

SAMN031 60297 

JSVB00000000 

Klebsiella  pneumoniae 

WOUND 

16 

MRSN  2404 

DC-Baltimore  metro 

LVL  5 

7/27/09 

PRJNA223625 

SAMN03486512 

LBIK00000000 

Klebsiella  pneumoniae 

WOUND 

17 

MRSN  6902 

SW  US/San  Antontio 

LVL  5 

2/21/12 

PRJNA223626 

SAMN03486513 

LBLU00000000 

Klebsiella  pneumoniae 

WOUND 

KPC+ 

18 

MRSN  8611 

DC-Baltimore  metro 

LVL  5 

2/7/04 

PRJNA223627 

SAMN03486514 

LBIJ00000000 

Staphylococcus  aureus 

BLOOD 

blood-wound  pair  (MRSN  8613)  from  same  patient 

19 

MRSN  8613 

DC-Baltimore  metro 

LVL  5 

2/12/04 

PRJNA223628 

SAMN03486515 

LBII00000000 

Staphylococcus  aureus 

WOUND 

blood-wound  pair  (MRSN  8611)  from  same  patient 

20 

MRSN  317 

DC-Baltimore  metro 

LVL  5 

6/9/10 

PRJNA223629 

SAMN03486516 

LBIH00000000 

Pseudomonas  aeruginosa 

WOUND 

21 

MRSN  321 

DC-Baltimore  metro 

LVL  5 

7/29/10 

PRJNA223630 

SAMN03486517 

LBIG00000000 

Pseudomonas  aeruginosa 

WOUND 

22 

MRSN  2154* 

COALITION  AFG-South 

LVL  3 

3/4/11 

N.A. 

N.A. 

N.A. 

Providencia  stuartii 

BLOOD 

23 

MRSN  2761 

DC-Baltimore  metro 

LVL  5 

2/19/03 

PRJNA223633 

SAMN03486518 

LBIF00000000 

Staphylococcus  aureus 

WOUND 

24 

MRSN  3562 

DC-Baltimore  metro 

LVL  5 

4/22/1 1 

PRJNA223634 

SAMN03486519 

LBIE00000000 

Klebsiella  pneumoniae 

BLOOD 

blood-tissue  pair  (MRSN  3852)  from  same  patient 

25 

MRSN  3852 

DC-Baltimore  metro 

LVL  5 

4/24/1 1 

PRJNA223635 

SAMN03486520 

LBID00000000 

Klebsiella  pneumoniae 

TISSUE 

blood-tissue  pair  (MRSN  3562)  from  same  patient 

*Note,  the  finished  genome  was  sequenced  by  WRAIR  MRSN  and  deposited  in  GenBank  separately. 


Task  lb.  Genomic  library  construction  and_  whole  genome  sequencing.  Construct  barcoded  standard  Illumina 
libraries  from  bacterial  genomic  DNA  and  carry  out  library  QC.  Pool  libraries  and  perform  sequencing  using 
the  Illumina  HiSeq  sequencing  platform.  [Task  COMPLETED] 


We  successfully  completed  library  construction,  and  QC  validation  for  all  25  isolates  (Figure  1,  Table  2). 
These  results  indicated  that  the  libraries  were  of  expected  DNA  insert  size  (-300-400  bp)  and  quantity  (2,000 
pM  minimum)  required  for  Illumina  sequencing.  Genomic  DNA  was  submitted  to  the  JTC  (the  JCVI 
sequencing  facility)  on  January  7,  2013  with  libraries  completed  on  March  5,  2013.  This  delay  in  library 
construction  was  due  to  a  backlog  of  library  construction  requests  in  our  sequencing  facility. 


Figure  1.  Agilent  High-Sensitivity  DNA  Assay  Results 
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Table  2.  MDRO  Library  QC  Summary 


Sample  ID 

Library  Name 

Lib  BC 

Lib  Size 

Lib  vol  (uL) 

Cone  (pM) 

MRSN  11489 

1 1 489-PE-IL1  -1 

2ZSYA 

360 

13.9 

12,903 

MRSN  10204 

10204-PE-IL3-1 

2ZSYC 

381 

13 

17,821 

MRSN  11248 

1 1 248-PE-IL4-1 

2ZSYD 

376 

14.5 

13,806 

MRSN  11639 

11639-PE-IL5-1 

2ZSYE 

436 

11.8 

33,103 

MRSN  1319 

131 9-PE-IL6-1 

2ZSYF 

390 

13.1 

15,321 

MRSN  2154 

2154-PE-IL7-1 

2ZSYG 

421 

16.9 

14,156 

MRSN  2404 

2404-PE-IL8-1 

2ZSYH 

373 

16 

10,768 

MRSN  2761 

2761-PE-IL9-1 

2ZSYI 

392 

18.7 

22,170 

MRSN  317 

31 7-PE-IL1 0-1 

2ZSYJ 

392 

14.7 

12,959 

MRSN  321 

321 -PE-IL1 1-1 

2ZSYK 

408 

14.4 

9,989 

MRSN  3405 

3405-PE-IL12-1 

2ZSYL 

349 

12.6 

29,267 

MRSN  3418 

341 8-PE-IL1 5-1 

2ZSYM 

331 

15.8 

21,850 

MRSN  3527 

3527-PE-IL16-1 

2ZSYN 

365 

15.1 

37,619 

MRSN  3562 

3562-PE-IL20-1 

2ZSYO 

365 

14.9 

60,433 

MRSN  3852 

3852-PE-IL21-1 

2ZSYP 

360 

14.7 

27,448 

MRSN  3942 

3942-PE-IL23-1 

2ZSYQ 

361 

13.1 

13,922 

MRSN  4106 

4106-PE-IL24-1 

2ZSYR 

395 

13 

23,512 

MRSN  4777 

4777-PE-IL26-1 

2ZSYS 

372 

13.8 

13,501 

MRSN  58 

58-PE-IL27-1 

2ZSYT 

372 

11.2 

16,076 

MRSN  6902 

6902-PE-IL28-1 

2ZSYU 

365 

17.2 

15,256 

MRSN  7339 

7339-PE-IL29-1 

2ZSYV 

356 

16.6 

56,048 

MRSN  7341 

7341-PE-IL30-1 

2ZSYW 

419 

17.6 

55,981 

MRSN  8611 

8611-PE-IL40-1 

2ZSYX 

362 

15.5 

13,318 

MRSN  8613 

8613-PE-IL41-1 

2ZSYY 

395 

18 

27,301 

MRSN  HE12 

HE12-PE-IL2-1 

2ZSYB 

350 

10.4 

11,164 

Task  lc.  Sequence  assembly  and  magging.  De-multiplex  Illumina  reads  and  carry  out  de  novo  assembly  and 
reference-based  assembly  to  obtain  genome  contigs  for  the  MDRO  bacterial  isolates. 

We  successfully  obtained  high-quality  genomic  sequence  data  from  the  Illumina  HiSeq  2000  platform  for  the 
25  selected  MDRO  bacterial  isolates.  The  samples  were  loaded  onto  the  HiSeq  machine  on  April  1,  2013  and 
completed  on  April  13,  2013.  The  delay  getting  the  completed  libraries  loaded  onto  the  sequencing 
machine  was  due  to  a  backlog  of  sequencing  requests  in  the  JTC.  Raw  reads  were  binned  (de-multiplexed) 
by  and  trimmed  of  their  corresponding  Illumina  MID,  and  reads  were  assembled  into  contigs  initially  using  the 
Celera  Assembler.  We  found  that  ~  90-fold  sequence  coverage  works  best  to  produce  high-quality  assemblies 
from  short-read  Illumina  data;  therefore,  a  sampling  of  reads  sufficient  to  produce  90-fold  coverage  (based  on 
the  predicted  genome  size)  was  used  for  genome  assembly. 

Genome  quality  of  our  sequencing  data  was  monitored  in  two  ways:  read-based  genome  QC  and 
assembly  based  metrics.  The  JCVI  Genome  QC  pipeline  samples  sequence  reads  and  performs  BLAST-based 
protein  and  nucleotide  searches  against  comprehensive  non-redundant  NCBI  sequence  databases.  The  results 
for  all  but  2  of  the  isolates  matched  their  predicted  bacterial  species.  The  two  that  deviated  from  the  predicted 
organism  identity  were  MRSN  2761  and  MRSN  4777.  For  MRSN  2761,  it  was  originally  determined  to  be 
Staphylococcus  haemolyticus;  however,  BLAST  analysis  suggested  that  it  was  S.  aureus.  COL  Lesho's  group  at 
MRSN  subsequently  confirmed  that  MRSN  2761  was  indeed  S.  aureus  based  on  the  presence  of  a  S.  aureus 
(MRSA)-specific  gene,  qacB  [2],  For  MRSN  4777,  it  was  originally  identified  as  Enterococcus  faecium,  but 
BLAST  results  suggest  that  it  is  more  likely  a  Ralstonia  species.  Analysis  of  the  genome  assembly  metrics 
(total  contig  span  and  GC%)  showed  that  only  one  genome,  MRSN  4777,  varied  from  expected  values  (Table 
3).  For  E.  faecium,  the  genome  size  should  be  ~  3  Mbp  with  a  GC%  of  ~  33%;  however,  our  assembly  results 
indicated  a  predicted  genome  size  (i.e.,  total  contig  span)  to  be  -5  Mbp  with  a  GC%  of  -65%,  which  matches 
closely  with  Ralstonia  sp.  (5.2  Mbp,  65%  GC).  MRSN  4777  was  later  confirmed  by  COL  Lesho's  group  to  be 
E.  faecium  by  sub-culturing  from  the  freezer  stock  archive  on  blood  and  McKonkey  agar  plates  and  taxonomic 
identification  by  the  BD  Phoenix  Automated  Microbiology  System  (Becton  Dickinson),  Vitek  2  (Biomerieux) 
and  Microscan  Walkaway  96  (Siemens)  sytems.  In  addition,  we  have  subsequently  determined  that  the 
genomic  sequence  data  was  a  mixture  of  a  Ralstonia  sp.  and  E.  faecium.  Complete  GenBank  reference  genomes 
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were  used  to  separate  reads  into  bins  belonging  to  these  two  distantly  related  microorganisms,  one  containing 
reads  matching  the  Ralstonia  reference  (NCBI  BioProject  #PRJNA50545)  and  one  containing  reads  matching 
the  Enterococcus  reference  (NCBI  BioProject  #PRJNA87025).  Reads  from  each  bin  plus  reads  that  did  not 
map  either  reference  genome  were  de  novo  assembled  using  the  Celera  Assembler  [3],  Even  though  the 
resulting  MRSN  4777  Enterococcus  assembly  generated  contigs  with  just  9.14-fold  average  sequence  coverage, 
most  of  the  reference  genome  was  covered  (red  diagonal  line,  Figure  2). 


Figure  2.  Mapping  of  MRSN  4777  Contigs  to  E.faecium  Reference 
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Enterococcus  faecium  Aus0004 


Though  the  assembly  results  were  of  good  quality  for  Illumina-only  data,  we  worked  to  improve  upon  these 
results  by  incorporating  raw  sequence  data  and  optical  maps  that  were  generated  by  MRSN  as  described  in  our 
proposal.  We  had  to  establish  an  MTA  in  the  form  of  a  CRADA  (fully  executed  on  June  13,  2013)  in  order  to 
obtain  genomic  sequence  and  OpGen  optical  restriction  map  data  from  WRAIR  MRSN.  The  CRADA  with 
WRAIR  took  ~  2  months  to  get  established,  delaying  progress  towards  fulfilling  this  aim.  JCVI  received  a 
hard-drive  from  MRSN,  containing  genomic  sequence  reads  and  optical  maps,  on  July  9,  2013.  A  JCVI  in- 
house  NextGen  Sequence  Assembly  pipeline  integrated  the  JCVI  Illumina  HiSeq  data  generated  from  this 
project  (2x100  bp  reads),  the  WRAIR  MRSN  reads  generated  from  various  platforms  (Ion  Torrent,  Illumina 
MiSeq,  and  454),  and  OpGen  optical  restriction  maps.  Overall,  Auto  Gap  Closure  improved  the  genome 
assemblies  of  all  isolates  by  reducing  the  number  of  contigs,  increasing  the  contig  N50  and  increasing  the 
genome  span.  Optical  maps  improved  assemblies  by  reducing  the  number  of  contigs  by  1-7  contigs  per 
genome.  Nine  of  the  fourteen  genomes  with  optical  map  data  had  75%  or  more  of  contigs  placed  on  the 
restriction  map.  454  Junior  data  supplied  by  MRSN  contributed  the  most  to  gap  closure;  however,  for  at  least 
one  of  the  strains,  MRSN  1 1489,  the  Ion  Torrent  200bp  reads  aided  automated  closure  of  plasmids.  This 
pipeline  was  described  in  our  Genome  Biology  publication  [1],  Appendix  I. 
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Task  Id.  Months  4-6.  Genome  annotation  and  analysis.  Run  JCVI  prokaryotic  genome  annotation  pipeline  to 
identify  gene  coding  regions  in  the  bacterial  genome  assembly.  Identify  genomic  features  associated  with 
antibiotic  resistance  and  generate  multilocus  sequence  typing  data.  Submit  Illumina  reads,  genomic  sequence 
contig,  gene  annotation  to  the  public  NCBI  databases.  Share  sequence  and  analyzed  data  with  WRAIR.  [Task 
COMPLETED] 

-  MANUSCRIPT  PUBLISHED  IN  GENOME 
BIOLOGY  July  20,  2015  [1],  Appendix  I. 

UPDATE:  In  a  separate  analysis  funded  by  the  NIAID 
Genome  Centers  for  Infectious  Diseases  (GCID),  we 
performed  a  large  scale  phylogenetic  analysis  of  379 
Enterobacter  cloacae  complex  strains  (Figure  3).  Data 
collected  included  the  average  nucleotide  identity  (ANI)  of 
core  genes,  historical  hsp60  typing,  and  core  single 
nucleotide  polymorphisms  (SNPs).  From  this  analysis  it 
was  determined  that  the  WRAIR  would  isolates  MRSN 
1 1248  and  MRSN  1 1489  sequenced  in  this  study  originally 
thought  to  be  E.  cloacae  should  now  be  classified  as  E. 
hormaechei  subsp.  oharae.  In  addition,  a  third  WRAIR 
isolate,  MRSN  17626,  not  sequenced  in  this  study  also 
appears  misclassified  and  now  belonging  to  an  unclassified 
E.  hormaechei  sp. 


Aim  lb.  Months  6-12.  Perform  bacterial-bacterial 
confrontation  assays  to  monitor  bacterial  cell 
interactions  between  MDROs  and  the  human  skin 
microbiome,  and  interactions  between  MDROs. 


Task  2a.  Months  6-9.  Bacterial-bacterial  confrontation 

assay.  Select  3  MDRO  isolates  from  Aim  la  and  3 
common  skin  microbes.  Propagate  bacterial  isolates  on  solid  media.  Set  up  biological  replicates  representing 
pairwise  bacterial-bacterial  co-culture  between  individual  MDROs  and  skin  microbes  and  between  individual 
MDROs.  Set  up  controls  for  self-self  interactions  for  each  bacterial  species.  Harvest  bacterial  cells  from  the 
co-culture  assays.  Repeat  Task  2a  for  experimental  replicates.  [Task  COMPLETED] 

Three  MDROs  (Acinetobacter  baumannii  MRSN  7339,  Klebsiella  pneumoniae  MRSN  1319,  and  Enterobacter 
hormaechei  MRSN  1 1489)  and  three  commensal  organisms  ( Staphylococcus  epidermidis  SK-135,  Lactobacillus 
reuteri  SD21 12  ATCC  55730  and  Corynebacterium  jeikeium  ATCC  43734)  were  used  in  pair-wise  confrontation 
assays  as  depicted  in  Table  3.  We  choose  L.  reuteri  over  Propionibacterium  acnes  because  P.  acnes  is  anaerobic, 
which  is  incompatible  with  growth  requirements  of  the  MDROs  and  remaining  commensals. 

Prior  to  commencing  with  confrontation  assays,  it  was  necessary  to  determine  accurate  growth  curves  of 
each  organism  so  that  equal  numbers  of  actively  dividing  cells  will  be  used  in  the  co-culture  assays.  Growth 
curves  for  each  organism  were  conducted  in  triplicate.  S.  epidermidis  SK-135,  C.  jeikeium,  L.  reuteri,  A. 
baumannii  MRSN  7339  and  A.  baumannii  MRSN58  were  cultured  in  BHI  broth.  L.  reuteri  was  cultured  in 
MRS  broth  (Figure  4). 


Figure  3:  Phylogenetic  SNP  tree  of 
Enterobacter  cloacae  complex  genomes. 

A  whole  genome  core  SNP  tree  was 
constructed  for  379  Enterobacter  cloacae 

complex  genomes  using  kSNP  and  RaXM). 

The  asterisk  indicates  the  Enterobacter 

genome  used  in  confrontation  assays. 
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Figure  4.  Growth  Curves  of  selected  microbes  for  confrontation  assays 


Lactobacillus  reuteri  ATCC  55730 
— Corynebacterium  jeikeium  ATCC43734 
-  *  -  Staphylococcus  epidermidis  SKI 35 
Staphylococcus  aureus  USA300 
-m-  Escherichia  coli  K12 
-m-  Acinetobacter  baumannii  MRSN7339 
-•-Acinetobacter  baumannii  MRSN58 


We  have  also  established  and  validated  our  co-culturing  plate  method,  and  RNA  extraction  procedures.  Our 
results  showed  that  plate  confrontations  using  previously  proposed  interlocking  stamps  for  inoculation  resulted 
in  poor  biological  reproducibility.  We  have  therefore  developed  a  new  approach  by  first  mixing  two  bacterial 
liquid  cultures  to  homogeneity  prior  to  inoculation  on  a  plate  to  produce  a  confluent  lawn  of  growth.  E.  coli  K- 
12  and  S.  aureus  USA300  (Figure  3)  were  used  in  a  control  experiments  to  validate  our  co-culturing  and  RNA 
extraction  methods  since  we  already  have  gene  expression  data  and  complete  genome  sequences  available  for 
comparison.  Quantitative  PCR  (qPCR)  was  used  to  measure  relative  gene  expression  of  S.  aureus  USA300 
genes  that  were  up  or  down  regulated  in  the  presence  of  E.  coli  K-12,  using  gyrB  as  a  constitutively  expressed 
reference  gene.  Preliminary  results  showed  that  the  new  co-culture  technique  was  reproducible  across 
experimental  replicates. 


Table  3.  Bacteria-Bacteria  Confrontation  Assays 


Target  Species 

Commensal 

Controls 

Challenger  species 

E. 

hormaechei 

K. 

pneumoniae 

A. 

baumannii 

EC 

KP 

AB 

S.  epidermidis 

SE 

SE:SE 

EC:SE 

KP:SE 

AB:SE 

L.  reuteri 

LR 

LR:LR 

EC:LR 

KP:LR 

AB:LR 

C.  jeikeium 

CJ 

CJ:CJ 

EC:CJ 

KP:CJ 

AB:CJ 

E.  hormaechei 

EC 

EC:EC 

KP:EC 

AB:EC 

K.  pneumoniae 

KP 

KP:KP 

AB:KP 

A.  baumannii 

AB 

AB:AB 

Bacteria-bacteria  confrontations.  Commensal  bacteria  were  obtained  from  ATCC/BEI  Resources.  A. 
baumannii,  K.  pneumoniae,  E.  hormaechei,  and  S.  epidermidis  were  grown  in  BHI  broth  (BD)  to  a 
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concentration  of  1  x  108  CFU/ml  in  log  phase  growth.  C.jeikeium  was  grown  to  the  same  concentrations  in 
BHI  supplemented  with  1%  (vol/vol)  Tween-80  (Fisher).  For  confrontations,  equal  amounts  of  cells  from  the 
cultures  were  combined  and  shaken  at  37°C  for  5  min  at  200  RPM.  1  x  107  CFU  of  each  confrontation  were 
plated  on  to  a  BHI  agar  plate,  each  in  triplicate.  Plates  were  incubated  at  37°C  for  6  hours,  at  which  point 
approximately  1  x  108  CFU  from  each  triplicate  plate  were  combined  and  treated  with  RNAprotect  bacteria 
reagent  (Qiagen)  according  to  the  manufacturer’s  instructions.  Bacterial  cells  were  spun  down  and  the 
RNAprotect  bacterial  reagent  removed  after  treatment.  Cell  pellets  were  stored  at  -80°C  until  RNA  extraction. 

Task  2b.  Months  9-12.  Bacterial  RNA-Seq  and  gene  expression  analysis.  Isolate  total  RNA  and  deplete 
bacterial  ribosomal  RNA  from  sample.  Construct  barcoded  ssRNA-Seq  libraries  for  Illumina  sequencing.  De¬ 
multiplex  Illumina  reads  and  carry  out  de  novo  and  reference-based  transcript  assembly  to  reconstruct  gene 
expression  levels  in  the  bacterial  transcriptome  during  co-culture  of  the  target  and  challenger  species.  [Task 
COMPLETED].  *NOTE:  A  manuscript  describing  this  work  is  being  edited  for  submission*. 

RNA  isolation.  RNA  was  extracted  using  RNeasy  Mini  Kit  (Qiagen).  The  manufacturer’s  protocol  was 
followed  with  a  few  additional  steps.  Enzymatic  lysis  was  performed  on  cell  pellets  when  resuspended  in  TE 
with  15  mg/ml  lysozyme  (Fisher)  and  3  mg/ml  Proteinase  K  (NEB)  and  incubated  for  10  min  at  room 
temperature  with  vortexing  for  10  s  every  2  min.  After  enzymatic  lysis,  mechanical  lysis  was  performed  by 
adding  buffer  RLT  with  1%  vol/vol  2-mercaptoethanol  and  transferring  the  lysate  to  a  Lysis  Matrix  B  bead 
beating  tube  (MP  bio).  Samples  were  homogenized  for  45  s  at  6.5  m/s  on  a  FastPrepl20  Cell  Disrupter  System. 
On-column  DNAse  treatment  (Qiagen)  was  used  according  to  the  manufacture’s  protocol.  Isolated  RNA  was 
treated  with  TURBO  DNase  (Ambion)  following  the  manufacture’s  protocol. 

Library  preparation  for  Illumina  sequencing.  Enrichment  for  mRNA  was  done  using  the  RiboZero  rRNA 
Removal  Kit  for  bacteria  (Epicentre).  Libraries  were  constructed  from  up  to  10  ng  of  mRNA  using  the 
NEBNext  Ultra  Directional  RNA  Library  Prep  Kit  (NEB)  following  the  manufacturer’s  protocol.  Libraries 
were  normalized  for  sequencing  using  qPCR  (Kapa  Biosystems  Library  Quantification  Kit).  The  resulting 
libraries  were  sequenced  on  the  Illumina  HiSeq  platform  (2x100  bp). 

RNA-Seq  data  analysis.  Illumina  sequencing  reads  in  FASTQ  format  were  analyzed  using  CLC  Genomics 
Workbench  version  6.5  (http://www.clcbio.com)  after  trimming  for  low  quality  reads  (CLC  quality  score  limit  = 
0.05,  maximum  of  2  ambiguities),  reads  less  than  50  bp  and  adaptors.  Known  4,692  distinct  rRNA  sequences 
were  collected  from  SILVA  database  (http://www.arb-silva.de/)  for  all  five  species  of  bacteria  used  in 
confrontations  and  mapped  using  BWA  [4]  allowing  up  to  three  mismatches.  Reference  files  for  RNA-Seq 
analysis  were  prepared  from  Genbank  files  imported  into  CLC  (Table  1).  All  rRNA  reads  were  removed  from 
further  analysis,  leaving  only  non-rRNA  reads  for  normalization  and  analysis.  For  each  control  and  co-culture 
experiment,  the  expected  genomes  were  selected  and  reads  were  mapped  using  the  following  setting:  maximum 
number  of  mismatches  =  2,  minimum  length  fraction  =  0.8,  minimum  similarity  fraction  =  0.98,  maximum 
number  of  hits  for  a  read  =10,  minimum  and  maximum  distances  for  paired  reads  =1,1000  and  counting 
scheme  of  “include  broken  pairs”.  The  expression  values  were  measured  in  RPKM  (Reads  Per  Kilobase  Per 
Million  mapped  reads).  For  confrontations,  RPKM  values  must  be  adjusted  since  the  denominator  for  the 
normalization  is  the  sum  of  reads  mapped  to  any  of  the  two  genomes.  The  adjustment  formula  for  the  RPKM 
value  for  a  gene  in  species  A  is:  (RPKM_adjusted_in_A  =  RPKM_in_A  *  (N_A+N_B)  /  N_A)  where  N_A  = 
total  number  of  reads  mapped  to  genome  A,  and  N_B  =  total  number  of  reads  mapped  to  genome  B.  The  total 
gene  reads  were  used  as  raw  read  count  to  determine  up  or  down  regulated  genes  using  default  settings  in 
EdgeR  ((http://bioconductor.org)  [5], 

All  raw  Illumina  sequence  read  data  was  deposited  into  the  NCBI  SRA  (Table  4).  NCBI  enables  Biosamples  to 
be  linked  to  more  than  one  Bioproject  so  that  if  a  researcher  is  interested  in,  for  example  L.  reuteri,  they  will 
have  access  to  every  Biosample  (i.e.,  experiment  involving  L.  reuteri ),  by  selecting  the  L.  reuteri  Bioproject. 
Confrontations  (i.e.,  Biosamples)  between  L.  reuteri  and  other  bacteria  will  also  be  linked  via  the  Bioprojects  of 
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the  confronting  organism.  We  have  linked  Biosamples  to  multiple  Bioprojects,  which  are  now  neatly  housed 
under  one  “umbrella”  Bioproject  (Table  4). 


Table  4.  Confrontation  Raw  Reads  Deposited  in  the  NCBI  SRA 


Biosample_idExperiment_iclConfrontations 


Biological  Replicate  BioProject  ic 


_  SAM  N 04532757  _ 
'  SAM  N  045327 5j8  ' 
"  SAM  N 045327 519  " 


ABABA  jA  baumannihn\y 


292770 

'292770" 

292770 


SAMN0453276P 

^SAMN045327^ 


ABCJD 

ABECA 


A.  baumanniivs. C.  jeikeium 
A.  baumanniivs.E.  hormaechei* 


292770 

292770 


..  SAM  N 045327682.  _ 
SAMN0453276I3 
'  SAM  N 04532764" 


ABECD 

ABKPA 

"abkpd" 


A.  baumanniivs.E.  hormaechei* 
A.  baumanniivs.K.  pneumoniae 
A.  baumanniivs.K.  pneumoniae 


292770 

292770 

"292770" 


SAMN0453276:5 

SAMN0453276-6 


ABLRA 

ABLRD 


A.  baumanniivs.L.  reuteri 
A.  baumanniNS.L.  reuteri 


292770 

292770 


SAMN04532767 

SAMN0453276i8 


ABSEA 

ABSEG 


A.  bdtumanniNS.S.  epidermidis 
A.  baumanni'NS.S.  epidermidis 


292770 

292770 


SAMN0453276I9 
'  SAM  NO'4'532776' ' 
SAMN0453277-1 


CJCJA 

"cjcjd" 

CJECA 


C.  jeikeiumonly 
C.  jeikeiumon\y_ 

C.  jeikeiurrus.E.  hormaechei 


294781 

"294781" 

294781 


SAMN0453277i2 

"SAMN0453277i3 


CJECD 

CJKPA 


C.  jeikeiurws.E.  hormaechei 
C.  jeikeiurrh/s.K.  pneumoniae 


294781 

294781 


SAM IN 104532774 
"  SAM  N0453277j5 
SAMN0453277fe 


CJKPE 

ECECA 

ECECD 


C.  jeikeiurrNs.K.  pneumoniae 
E.  hormaecheidnty 
E.  hormaecheid nly 


294781 
292777 
" 292777 " 


SAMN04532777 

SAMN04532778 


ECKPA 

ECKPD 


E.  hormaecheiVs  X.  pneumoniae 
E.  hormaechei\sK.  pneumoniae 


292777 

292777 


SAMN04532779 

SAMN0453278P 


ECLRA 

ECLRD 


E.  hormaecheiys .L.  reuteri 
E.  hormaecheiVs.L.  reuteri 


292777 

292777 


SAM  N 0453278:1 
"SAMN045327^ 
SAM  N0453278P 


ECSEA 

ECSEG 

kpkpa" 


E.  hormaecheiys .S.  epidermidis 
E.  hormaechei\s.S.  epidermidis 
K.  pneumoniaepn 1  ly 


292777 
292777 
" 292776 " 


SAMN04532784 

SAMN0453278I5 


KPKPD 

KPLRA 


K.  pneumoniaepnly 
K.  pneumoniae/s.L.  reuteri 


292776 

292776 


SAMN0453278P 

SAMN0453278I7 


KPLRD 

KPSEA 


K.  pneumoniae/s.L .  reuteri _ 

K.  pneumoniae/s.  S.  epidermidis 


292776 

292776 


SAMN0453278I8 

'SAMN0453278i9" 

SAMN0453279D 


KPSEG 

"lrlra" 

LRLRD 


K.  pneumoniae/s.S.  epidermidis 

L.  reuteri or\\y 
L.  reuteri or\\y 


292776 

"294773" 

294773 


SAMN0453279:1 

SAMN0453279i2 


SESEI 

SESEJ 


S.  epidermidis) nly 
S.  epidermidis^ nly 


294780 

294780 


SAMN04532793 

AB3 

A.  baumanniino  human  cells 

1 

292770 

SAM  N 045327 94 

AB8 

A.  baumanniino  human  cells 

2 

292770 

SAMN0453279j5 

EC3 

E.  hormaecheifio  human  cells 

1 

292777 

SAM  N 045327 9|6 

EC8 

E.  hormaecheifio  human  cells 

2 

292777 

SAM  N 045327 97 

KP3 

K.  pneumoniaeno  human  cells 

1 

292776 

SAM  N 045327 9j8 

KP8 

K.  pneumoniaeno  human  cells 

2 

292776 

SAM  N 04 5348 1 7 

HDF1 

HDF  cells  only 

1 

314266 

SAM  N 04 5348 1 18 

HDF4 

HDF  cells  only 

2 

314266 

SAMN045348S 

ABHDF1 

A.  baumannii  i/sHDF  cells 

1 

314266 

SAMN0453482P 

ABHDF6 

A.  baumannii  i/sHDF  cells 

2 

. 314266 . 

SAM  N 04 53482:1 

ECHDF1 

E.  hormaechei  i/iHDF  cells 

1 

314266 

SAMN0453482I2 

ECHDF6 

E.  hormaechei  i/dHDF  cells 

2 

314266 

SAM  N0453482I3 

KPHDF1 

K.  pneumoniae  i/sHDF  cells 

1 

314266 

SAMN0453482kt 

KPHDF6 

K.  pneumoniae  i/sHDF  cells 

2 

314266 

*  MRSN  11489,  originally  named  E.  cloacae,  has  been  renamed  to  E.  hormoechei  subsp.  ohoroe 


Reverse  Transcription  and  qRT-PCR.  Reverse  transcription  (RT)  was  performed  to  generate  cDNA  from  both 

biological  replicates  to  be  used  in  qRT-PCR  for  validation  of  RNA-Seq  data.  RNA  concentrations  were 

determined  with  an  Agilent  2100  Bioanalyzer  and  1.5  pg  of  total  RNA  were  used  in  each  reaction.  RNA  was 

incubated  in  18.5  pi  with  6  pg  of  random  hexamers  (Invitrogen)  and  40  U  of  RNaseOUT  (Invitrogen)  at  70°C  for 

10  min,  followed  by  snap  cooling  on  ice.  Then  the  reaction  was  brought  to  IX  First  Strand  Buffer,  10  mM  DTT, 

0.5  mM  dNTPs  (Invitrogen),  and  400  U  Superscript®  III  Reverse  Transcriptase  (Invitrogen)  in  30  pi.  The 

reaction  was  incubated  in  a  42°C  water  bath  for  16  h.  and  then  reaction  was  stopped  and  RNA  was  hydrolyzed 

with  0.1  M  EDTA  and  0.2  M  NaOH  at  65°C  for  12  min.  Tris-HCl  (pH  7.0)  was  added  to  neutralize  the  pH. 

Purification  of  cDNA  was  achieved  by  using  QIAquick  PCR  Purification  kit  (Qiagen)  following  the 

manufacturer’s  protocol  and  quantitated  by  fluorometric  assays  using  SYBR  Gold  (Invitrogen). 
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qRT-PCR  reactions  were  performed  with  10  ng  of  template  cDNA  in  10  pi  triplicated  reactions  with 
SYBR  Green  Master  Mix  (Roche)  in  384-well  plates  on  the  LightCycler480  qPCR  instrument  (Roche).  Cycling 
conditions  were  95°C  for  10  min  followed  by  50  cycles  of  95°C  for  20  s,  58°C  for  30  s,  and  72°C  for  15  s.  A 
melting  curve  analysis  was  performed  after  all  cycles  were  completed  to  verify  the  Tm  of  amplified  products. 
No-RT  controls  and  qPCR  control  reactions  were  performed  as  well.  To  determine  the  CP  for  each  reaction,  the 
‘Fit  Point  Method’  was  performed  in  the  LightCycler480  software  1.5.0  (Roche  Diagnostics).  Efficiency  for 
each  primer  was  optimized  by  adjusting  primer  concentration  and  was  determined  for  the  reaction  conditions. 
Efficiency  and  CP  values  were  used  to  calculate  the  gene  expression  with  normalization  to  gyrB.  Relative 
expression  ratios  were  determined  by  comparing  co-culture  samples  with  monoculture  samples  using  the 
method  described  previously  [6],  Primer  sequences  and  concentrations  used  are  in  Table  5. 


Table  5.  Sequence  and  concentration  of  qPCR  primers  used 


Target  Gene 

Primer  Name 

Primer  Sequence 

Concentration  of  primer  in 
qPCR  reactions 

A.  boumonnii  primers 

DNA  gyrase,  B  subunit 

T634_0013  Fwd 
T634_0013  Rvs 

T  GAGCTTGCTTTACATATTAGTGCT 

ACTTCGAGTAATGCATCCAGTAAG 

0.2  uM 

conserved  hypothetical  protein 

T634_0987  Fwd 
T634_0987  Rvs 

GCGAGCCATACCGTTTACTAT 

AAGT  GTACCCGCACCATAATTT 

1  uM 

ethylmalonate-semialdehyde 

dehydrogenase 

T634_0140  Fwd 
T634_0140  Rvs 

TGAAGAAGTCGACGCTGCTA 

TCAGCCGTCAATACTTGTGC 

0.5  uM 

aldehyde  dehydrogenase  (NAD)  T634_2358  Fwd 
family  protein  T634_2358  Rvs 

AAAG  CTT  CAT  G  G  AATAAAT  CTT  C  AC 

AAGGT  CAGCTGCTAAAGTTT  CAC 

0.2  uM 

PapD  pilus/flagellar-assembly 
chaperone  N-terminal  domain 

T634_2490  Fwd 
T634_2490  Rvs 

CCAAATGCGTTACTCAATTCC 

AGCT  CAGACTGGCCTT  GTT  G 

0.5  uM 

Hpt  domain  protein 

T634_3235  Fwd 
T634_3235  Rvs 

ATCTTAATATTTGGGTAGGTGAGCA 

AAC  G  CT  C  ATAG  AT  G  CTTT  CTAATT  C 

0.2  uM 

EamA-like  transporter  family 
protein 

T634_3500  Fwd 
T634_3500  Rvs 

GCAATATGTGACTGCGGTTG 

T  CTCCACT  GACCACCAACAC 

0.5  uM 

fimbrial  assembly  protein  PilN 

T634_3585  Fwd 
T634_3585  Rvs 

CGT  CCAGTAGTGGTAAGACTT  G 

GGAGTAGTTCGGCTACTGTATTT 

0.2  uM 

K.  pneumoniae  primers 

PTS  system  fructose  IIA 
component 

T643_A0079  Fwd 
T643_A0079  Rvs 

TTTGCCGGTAGCGT  GAATA 

CGTTTGTTAATGCCTCGGTAAT 

1  uM 

SIS  domain  protein 

T643_A0084  Fwd 
T643_A0084  Rvs 

CAGGACGAGTACCT  GACCAGT 

C  AAATT  C  ATTAATT  G  C  CAT  CAT  C 

1  uM 

citrate/malate  transporter 

T643_A0191  Fwd 
T643_A0191  Rvs 

AGCTGGCTAACGTGGTGTCT 

T  GAAGGCGTTTACCAGTTCC 

1  uM 

aconitate  hydratase  2 

T643_A0266  Fwd 
T643_A0266  Rvs 

ATAGCCACAAAGGCCAACTG 

ACCCATACACAGGGAACAGC 

1  uM 

nitrite  extrusion  protein  1 

T643_A2635  Fwd 
T643_A2635  Rvs 

GCCTTATTACGCGTGCCTTA 

GATGCTAAACGGCGTAGAGG 

1  uM 

4-aminobutyrate  transaminase 

T643_A4354  Fwd 
T643_A4354  Rvs 

CAT  CTGCAGGAAGTGCT  CAA 

GTTCTCCTGCGCTTTCTGTT 

1  uM 

DNA  gyrase,  B  subunit 

T643_A4645  Fwd 
T643_A4645  Rvs 

CT  GTT  GACCTT  CTT  CTAT  CGT  CAG 

GATGGAGAT  CT  GATACT  GATCCA 

1  uM 
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Response  of  A.  baumannii 

The  response  of  A.  baumannii  co-cultured  with  K.  pneumoniae,  E.  hormaechei  or  C.  jeikeium  is  illustrated  in 
Figure  5,  panel  A-C. 

Upregulated  genes.  There  are  only  28  commonly 
upregulated  genes  when  A.  baumannii  was  in  co¬ 
culture  with  K.  pneumoniae,  E.  hormaechei  or  C. 
jeikeium.  We  also  observed  the  differential 
regulation  of  some  genes  common  to  all 
confrontations  suggesting  a  general  stress 
response  or  growth  phase  associated  gene 
expression. 

In  co-culture  with  K.  pneumoniae,  of  the 
190  A.  baumannii  genes  upregulated,  only  49  of 
these  were  specific  to  this  confrontation  (see 
Figure  5,  Panel  A).  Among  the  49  uniquely 
upregulated  genes  specific  to  the  confrontation 
with  K.  pneumoniae,  8  flexible  Genomic  Island 
(fGI)  classified  genes  were  identified  of  which 
three  are  phage  associated,  one  is  fGI,  comM  [1] 
and  the  remaining  of  unknown  function,  with  one 
protein  identified  as  the  carbon  starvation  protein 
CstA.  Other  uniquely  upregulated  genes  include 
genes  associated  cell  envelope,  other  cellular 
processes,  energy  metabolism  and  one  Ton-B- 
dependent  siderophore  receptor,  efflux  and  ABC 
type  transporters  and  some  hypotheticals. 

With  the  E.  hormaechei  co-culture,  160  A. 
baumannii  genes  were  upregulated,  of  which  only 
35  were  specific  to  this  bacterial  confrontation. 

The  35  uniquely  upregulated  genes  specific  in 
confrontation  with  E.  hormaechei  include:  two 
fGI  genes  that  appear  to  be  involved  in  energy 
metabolism  (succinylomithine 

transaminase/acetylomithine  aminotransferase 
domain  protein  and  arginine  N- 
succinyltransferase),  eight  transport  and  binding 
family  proteins  that  include  Ton-B  dependent 
receptors,  ABC  transporters  and  some  iron-chelate  and  siderophore  interacting  proteins. 

The  most  number  of  differentially  regulated  genes  in  any  confrontation  was  observed  when  A.  baumannii 
was  co-cultured  with  the  commensal  C.  jeikeium  and  in  this  confrontation,  1 85  genes  were  differentially  expressed 
of  which  141  genes  were  unique  to  this  confrontation.  About  half  of  these  are  categorized  as  hypotheticals  or 
unknown  function,  but  in  contrast  to  the  MDRO-MDRO  confrontations,  we  observed  specific  upregulation  of 
regulatory  functions  and  cell  envelope  related  functions.  More  fGI  and  fGI  phage  genes  were  also  differentially 
expressed  in  this  particular  co-culture.  Most  of  these  (12)  are  hypothetical  proteins,  but  also  includes  a  cold  shock 
protein  CspE  and  CRISPR  associated  proteins.  One  distinct  phenomenon  was  the  upregulation  of  Type  IV  pili 
genes  only  observed  in  the  AB-CJ  confrontation. 

Of  the  highest  upregulated  genes  (>  1.5  log  FC),  20  were  in  specific  response  to  E.  hormaechei  (Figure 
5,  panel  B)  There  are  genes  of  transport  and  binding  category,  including  some  TonB  -dependent  receptors  and 
ABC  transporters,  and  the  remaining  appear  to  be  involved  in  energy  metabolism  and  other  cellular  processes. 
No  acquired  antibiotic  resistance  genes  were  induced.  In  contrast  to  the  strong  response  of  K.  pneumoniae  to  A. 
baumannii,  the  response  of  Acinetobacter  towards  Klebsiella  appears  to  be  mild  with  only  10  highly  upregulated 


A.  baumannii  genes 

with  C.  jeikeium 

with  E.  cloacae 

Q  with  K.  pneumoniae 

Up-reg 

K.  pneumoniae  genes 

with  C.  jeikeium 

with  E.  cloacae 

with  A.  baumannii 

ulated 

A  €# 

141 

+  28 16  ^ 

\  35  \§ 49  j*  yW 

D  * 
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B  1 

a  jj+h 

■---4  CO 

• 

LU 

Down-regulated 

191 

0  19 

F  * 

28 

_  29  44  - 

4-i  82  ”  352  -  ^ 

Non-coding  RNA  | |  Unknown  function 

Transport  and  binding  proteins  1 1  Signal  transduction 

Regulatory  functions  1 1  Energy  metabolism 

Protein  fate  ■  ■  Cellular  processes 

Mobile  and  extrachromosomal  element  functions  |  H  CeH  envelope 
fGI  ■  ■  fGI,  Phage 

Figure  5.  Genes  differentially  expressed  (DE)  in  bacterial  confrontations. 

DE  is  defined  as  a  FDR  of  <  0.05.  Left  panels  are  counts  of  A.  baumannii  DE 
genes  when  confronted  with  C.  ieikeium  (pink  circle),  E.  hormaechei  (yellow 

circle),  or  K.  pneumoniae  (red  circle).  Right  panels  are  counts  of  K.  pneumoniae 
DE  genes  when  confronted  with  C.  jeikeium  (pink  circle),  E.  hormaechei  (yellow 
circle),  or  A.  baumannii  (blue  circle).  The  lower  figure  legend  classifies  the  genes 
in  the  pie  charts.  Pie  charts  show  the  breakdown  of  DE  gene  types  that  are 
uniquely  regulate  in  the  confrontations.  Cellular  processes-  Biosynthesis  of 
cofactors,  prosthetic  groups  and  carriers,  Fatty  acid  and  phospholipid 
metabolism,  Amino  acid  biosynthesis,  DNA  metabolism,  Central  intermediary 
metabolism,  Purines,  pyrimidines,  nucleosides,  and  nucleotides,  Transcription; 
Unknown  function-  Hypothetical  proteins,  Unclassified,  Null;  Non-coding  RNAs- 
tRNA,  ncRNA,  tmRNA. 
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genes.  These  include  proteins  of  unknown  function,  ABC  transporter  family,  and  one  protein  annotated  as  a  spore 
coat  protein  (T634_2491).  A  BLASTp  search  and  protein  threading  (using  Phyre  v2.0)  [7]  of  this  sequence  shows 
that  it  is  has  a  cell  adhesion  or  protein  binding  domain,  but  its  function  remains  unknown.  Of  the  32  genes  highly 
upregulated  in  AB-CJ  confrontation,  one  interesting  trend  is  the  upregulation  of  a  number  of  genes  related  to  pili 
and  fimbrial  assembly.  Also  induced  are  three  genes  belonging  to  the  same  fGI,  of  which  two  can  also  be  linked 
with  a  cell  envelope  related  role. 

We  observed  differential  regulation  of  the  siderophore  cluster  3,  Acinetobactocin  biosynthesis  and 
transport  including  the  Fur  regulated  the  bau  genes  occur  in  both  A.  baumannii-E.  hormaechei  and  A.  baumannii- 
K.  pneumoniae  confrontation,  but  the  fold  induction  was  1.5-2  fold,  specifically  in  the  A.  baumannii-E. 
hormaechei  confrontation  (Appendix  II).  From  previous  studies  it  is  evident  that  Acinetobactin  (high-affinity 
iron  chelator)  synthesis  is  essential  for  growth  in  vitro  in  iron-limiting  conditions  as  well  as  for  persistence  in  the 
human  host  and  for  successful  virulence  in  model  in  vivo  infection  systems  [8],  In  the  A.  baumannii-E. 
hormaechei  co-culture  we  also  observed  upregulation  of  the  Aerobactin  biosynthesis  proteins  and  TonB 
siderophore  receptor,  suggesting  that  these  organisms  were  creating  iron-limiting  conditions  in  the  co-culture 
requiring  the  production  of  iron  uptake  systems  for  survival,  similar  to  conditions  that  may  occur  during  infection 
in  the  host. 

Two  of  the  most  highly  upregulated  genes  we  observed  in  A.  baumannii  are  aldehyde  dehydrogenase  and 
an-iron  dependent  alcohol  dehydrogenase.  These  genes  were  upregulated  3.6  fold  against  both  MDROs  and  1- 
fold  against  C.jeikeium  although  the  induction  in  this  case  is  not  specific  to  the  presence  of  ethanol  in  the  medium, 
unless  it  was  produced  by  the  co-culture  participant.  Presence  of  low  concentrations  of  ethanol  favors  growth  of 
A.  baumannii  and  induces  ompA,  and  also  elicits  a  global  stress  response,  increasing  salt  tolerance,  increasing 
virulence  against  potential  competitors  or  predators  in  the  environment,  induction  of  proteins  to  protect  against 
oxidative  stress,  and  molecular  chaperones  to  prevent  protein  misfolding.  This  is  similar  to  the  response  seen 
from  presence  of  reactive  oxygen  species  and  presence  of  antibiotics.  Ethanol  also  induces  lipid  biosynthesis 
genes  that  may  correlate  with  membrane  biogenesis  and  the  formation  of  biofilm  structures  as  another 
environmental  stress  response  [9], 

Downregulated  genes.  Only  19  genes  are  commonly  downregulated  when  AB  confronts  each  of  the  other 
MDROs  and  the  commensal  (Figure  5  Panel  C).  Of  the  180  genes  downregulated  in  response  to  K.  pneumoniae , 
only  55  are  specific  to  this  co-culture.  There  was  no  striking  functional  class  of  genes  that  was  downregulated 
and  included  some  fGI  genes,  but  also  unknown  function,  energy  metabolism,  transport  and  binding  and  genes  in 
categories  that  were  also  upregulated.  When  in  co-culture  with  E.  hormaechei,  only  37  of  the  143  genes 
downregulated  were  specific  to  this  confrontation,  and  there  was  no  distinctive  pattern  of  downregulation  of  a 
particular  functional  category.  In  both  these  cases,  the  energy  metabolism  genes  that  were  downregulated  appear 
to  be  those  that  would  not  be  beneficial  to  the  particular  environment,  unlike  the  metabolic  pathways  that  could 
be  induced  for  competitive  growth  and  survival  against  competing  bacteria  under  limited  nutrient  conditions. 

Similar  to  the  trend  of  upregulated  genes  of  A.  baumannii  in  response  to  C.  jeikeium,  229  genes  were 
downregulated,  of  which  almost  all  (191)  were  specific  to  this  confrontation.  Several  of  these  are  of  unknown 
function,  but  we  also  observed  an  increased  number  of  protein  transport  and  processing,  cell  envelope  related 
genes,  transport  and  binding  proteins,  and  those  categorized  as  involved  in  cellular  processes.  In  the  latter  category 
are  genes  like  CsuE,  universal  stress  proteins,  and  a  p-lactamase.  Genes  associated  with  iron  scavenging  were 
distinctly  downregulated  in  the  A.  baumannii-C. jeikeium  confrontation. 

The  Csu  chaperone  system  associated  with  Type  I  pili  formation  was  upregulated  only  in  the  A. 
baumannii-K.  pneumoniae  confrontation  about  1 .5  fold,  while  distinctly  downregulated  almost  3  fold  when  co¬ 
cultured  with  C.  jeikeium.  The  csu  operon  is  thought  to  be  relevant  for  adherence  and  formation  of  biofilm  on 
abiotic  surfaces  in  A.  baumannii  ATCC  19606T,  but  not  for  human  respiratory  cells  [10],  Biofilm  formation  in 
A.  baumannii  has  been  extensively  studied  and  several  factors  critical  for  this  phenomenon  have  been  identified- 
Csu  encoding  genes,  ompA  (homologous  to  bap  in  Staphylococcus),  autoinducer  synthase  Abal,  a  part  of  the 
quorum  sensing  system  [11],  and  pgaABCD  operon  responsible  for  production  of  PNAG  (poly-p-l,6-N- 
acetylglucosamine)  have  been  proposed  as  critical  for  biofdm  formation  [12],  Type  IV  pili  (TFP)  genes  were 
upregulated  only  in  the  A.  baumannii-C.  jeikeium  confrontation.  These  pili  are  associated  with  natural 
transformation,  twitching  mobility,  and  biofilm  formation  [13],  The  deletion  of  TFP  retraction  gene  pilU  and  pilT 
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leads  to  reduction  in  biofilm  formation,  indicating  that  fully  functional  TFP  are  required  for  biofilm  production 
in  strain  A.  baumannii  ATCC  17978  [14].  Suggesting  that  A.  baumannii  may  form  biofilms  with  C.  jeikeium. 

Response  of  K.  pneumoniae 

Many  of  the  differentially  regulated  genes  in  K.  pneumoniae  indicate  a  general  stress  response-  induction  of  phage 
and  transposon  genes,  universal  stress  proteins,  DNA  damage  repair,  and  stationary  phase  stress  response  [15-17] 
(Appendix  III).  Studies  also  implicate  a  connection  between  environmental  stressors,  phage  induction  and 
biofilm  formation  [18],  [19].  The  induction  of  BhsA  (putative  multiple  stress  resistance  protein)  may  be  another 
indication  of  stress  in  the  growth  environment.  In  Escherichia  coli  it  is  also  annotated  as  YcfR  and  is  known  to 
encode  a  putative  outer  membrane  protein,  and  induced  in  E.  coli  biofilms  and  during  stress  conditions.  The 
deletions  of  this  multiple  stress  resistance  protein  can  make  the  cell  more  sensitive  to  acid,  heat,  H2O2,  and 
cadmium.  YcfR  was  observed  to  be  involved  in  the  regulation  of  E.  coli  K-12  biofilm  formation  and  hence  it  has 
been  proposed  that  that  this  locus  be  named  bhsA,  for  influencing  biofilm  through  hydrophobicity  and  stress 
response  [20],  Some  of  the  responses  might  confer  K.  pneumoniae  with  some  growth  advantage  like  the 
upregulation  of  PQQ  may  allow  participation  in  diverse  carbon  utilization  pathways.  The  anti-restriction  protein 
ArdA  upregulated  against  A.  baumannii  may  prime  the  organism  for  DNA  uptake.  These  ArdA  proteins  and  type 
I R-M  systems  are  more  frequently  discovered  through  genome  sequencing  in  several  bacteria  and  is  hypothesized 
to  be  involved  in  the  regulation  of  gene  transfer  among  bacterial  genomes  [21]. 

Upregulated  genes.  K.  pneumoniae  appears  to  respond  strongly  to  A.  baumannii  and  this  is  evident  from  the 
number  of  genes  that  were  upregulated  (411)  (Figure  5,  panel  D).  Of  these,  367  genes  are  unique  to  the  A. 
baumannii  confrontation.  We  identified  biofilm  regulatory  proteins  BssS  (T643_A0534),  LuxR  family 
transcriptional  regulators,  an  inhibitor  of  vertebrate  lysozyme,  toxin  RelE,  plasmid  stabilization  protein  RelE/ParE 
family,  universal  stress  family  proteins,  a  multiple  stress  resistance  protein  BhsA,  a  protein  that  provides  DNA 
protection  during  starvation,  the  cell  division  inhibitor  protein  Sul  A,  MarR  associated  with  drug  resistance  [22], 
competence/damage  inducible  protein  CinA,  Ton-B  associated  receptors,  hemin  binding  proteins  and  iron  chelate 
associated  proteins,  a  protein  associated  with  toxin-antitoxin  system,  a  conjugal  transfer  entry  exclusion  protein 
of  the  TraS  family,  quaternary  ammonium  compound-resistance  protein  SugE,  a  DDE  domain  protein  [23],  In  the 
co-culture  with  E.  hormaechei,  Klebsiella  showed  a  moderate  response  compared  with  its  response  to 
Acinetobacter  and  upregulates  only  42  genes.  Of  the  22  genes  that  are  uniquely  upregulated  there  is  a  component 
of  a  toxin-antitoxin  system,  universal  stress  proteins,  Fe  SOD,  hemolysin  expression  modulating  protein,  an  O- 
antigen  biosynthesis  protein  WbnF.  The  response  to  C.  jeikeium  was  also  comparably  mild;  only  40  genes  were 
upregulated,  of  which  14  were  unique  to  this  confrontation  and  include  transport  and  binding  proteins,  some 
proteins  of  unknown  function,  DNA  repair  UmuD  and  a  Type  I  restriction  enzyme  like  domain  protein. 

There  are  59  genes  induced  specifically  (>  1.5  log  FC)  in  response  to  A.  baumannii  (Figure  5,  panel  E). 
Most  of  these  were  of  unknown  function.  Other  categories  represented  include  metabolism,  general  cellular 
processes,  transport  and  binding,  mobile  and  extra-chromosomal  elements,  some  factors  associated  with 
regulatory  functions  and  protein  fate.  Some  phage  related  proteins  were  also  upregulated.  K.  pneumoniae  also 
induces  the  multiple  stress  resistance  protein  BhsA  when  in  contact  with  the  MDRO  Acinetobacter  and 
upregulates  the  expression  of  PQQ,  and  the  anti-restriction  protein  ArdA.  Pyrroquinoline  quinone  is  a  low 
molecular  weight  redox  active  cofactor  for  bacterial  dehydrogenases  not  required  for  bacterial  survival,  but  can 
enhance  the  rate  of  cell  growth  by  participating  in  carbon  utilization  pathways.  K.  pneumoniae  has  a  demonstrated 
ability  to  produce  PQQ  and  this  is  accomplished  by  the  pqq  operon  comprising  of  six  genes  pqqA-F  [24]  [25], 
The  TonB  dependent  and  heme  transport  systems  are  also  significantly  upregulated  (3-5  fold).  K.  pneumoniae 
showed  a  milder  response  to  C.  jeikeium  compared  with  its  reaction  to  A.  baumannii  with  only  12  specifically 
induced  genes-  tRNAs,  hypothetical  and  few  categorized  as  transport  and  binding  proteins  (phosphate  transport 
system/  PTS  components).  Only  three  genes  were  uniquely  upregulated  in  response  to  E.  hormaechei  -  a  sodium 
transporter,  universal  stress  protein  and  Hsp20  related  to  proteins  fate. 

Downregulated  genes.  The  downregulation  of  K.  pneumoniae  genes  is  shown  in  Figure  5,  panel  F.  In  response 
to  the  MDRO  E.  hormaechei,  147  genes  were  downregulated,  of  which  82  were  specific  to  this  confrontation. 
More  than  a  third  of  these  are  involved  in  transport  and  binding,  especially  a  number  siderophores  and  iron 
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transporter  related  proteins  are  downregulated  1.5-2  fold.  The  response  to  the  commensal  C.  jeikeium  was 
relatively  mild;  only  28  of  the  118  genes  were  specific  to  this  confrontation  and  mostly  involved  in  basic  cellular 
processes  and  energy  metabolism.  The  response  to  A.  baumannii  was  more  dramatic,  with  432  genes 
downregulated  and  of  these  352  genes  is  specific  to  this  confrontation.  Over  50  of  these  are  involved  in  transport 
and  binding  and  energy  metabolism  and  132  in  basic  cellular  processes,  primarily  protein  synthesis  and  central 
intermediary  metabolism,  transport  and  binding-  ABC  type  transporters  mostly  for  amino  acid,  di-and  oligo¬ 
peptides  transport. 

Many  of  the  differentially  regulated  genes  in  K.  pneumoniae  indicate  a  general  stress  response-  induction 
of  phage  and  transposon  genes,  universal  stress  proteins,  DNA  damage  repair,  and  stationary  phase  stress  response 
[15-17],  Studies  have  also  shown  a 
connection  between  environmental 
stressors,  phage  induction  and 
biofilm  formation  [18],  [19].  The 
induction  of  BhsA  (putative 
multiple  stress  resistance  protein) 
may  be  another  indication  of  stress 
in  the  growth  environment.  In  E.  coli 
it  is  also  annotated  as  YcfR  and  is 
known  to  encode  a  putative  outer 
membrane  protein,  and  induced  in 
E.  coli  biofilms  and  during  stress 
conditions.  The  deletions  of  this 
multiple  stress  resistance  protein  can 
make  the  cell  more  sensitive  to  acid, 
heat,  H2O2,  and  cadmium.  YcfR  was 
observed  to  be  involved  in  the 
regulation  of  E.  coli  K-12  biofilm 
formation  and  hence  it  has  been 
proposed  that  that  this  locus  be 
named  bhsA,  for  influencing  biofilm 
through  hydrophobicity  and  stress 
response  [20],  Some  of  the 
responses  might  confer  K. 
pneumoniae  with  some  growth 
advantage  like  the  upregulation  of 
PQQ  may  allow  participation  in 
diverse  carbon  utilization  pathways. 

The  anti-restriction  protein  ArdA 
upregulated  against  A.  baumannii 
may  prime  the  organism  for  DNA 
uptake.  These  ArdA  proteins  and 
type  I  R-M  systems  are  more 

frequently  discovered  through  genome  sequencing  in  several  bacteria  and  is  hypothesized  to  be  involved  in  the 
regulation  of  gene  transfer  among  bacterial  genomes  [21]. 


KP 

AB 

EH 

a 

logCPM 

logFC  FDR 

logFC 

FDR 

logFC 

FDR 

logFC 

FDR 

KP 

T643 A0903 

entD 

Enterobactin 

4.80 

■ 

0 

^B 

9 

D 

T643  A0904 

fepA 

Enterobactin 

[  9.34 

■ 

9 

I  i 

9 

B 

T643  A0905 

fes 

Enterobactin 

^^B  7.39 

1 

9 

1  1 

9 

1 

T643  A0906 

mbtH 

Enterobactin 

4.76 

■ 

O 

l  i 

9 

D 

T643  A0907 

entF 

Enterobactin 

9.36 

1 

l  i 

9 

■ 

:■ 

T643  A0908 

fepC 

Enterobactin 

6.00 

1 

B 

9 

| 

T643  A0909 

fepG 

Enterobactin 

5.43 

1 

1  1 

9 

I 

T643  A0910 

fepD 

Enterobactin 

5.74 

1 

I  1 

9 

1 

T643  A0911 

entS 

Enterobactin 

^B  5.33 

■ 

Q 

9 

0 

T643  A0912 

fepB 

Enterobactin 

^B  6.25 

1 

1  1 

9 

I 

T643  A0913 

entC 

Enterobactin 

8.10 

■ 

9 

nan 

9 

g 

T643  A0914 

entE 

Enterobactin 

IT23  8.30 

9 

i  i 

9 

B 

9 

T643 A0915 

entB 

Enterobactin 

■I  7.99 

fl 

9 

i  i 

9 

B 

9 

T643_A0916 

entA 

Enterobactin 

7.40 

■ 

9 

9 

■ 

9 

AB 

T634_2721 

basJ 

Acinetobactin 

7.37 

1 

■ 

□ 

9 

T634_2722 

bast 

Acinetobactin 

3.84 

1  1 

■ 

9 

1 

T634_2723 

basH 

Acinetobactin 

5.30 

1 

■ 

9 

B 

O 

T634  2724 

barB 

Acinetobactin 

■  7.31 

■  |p~ 

■ 

9 

■ 

9 

T634 2725 

bar  A 

Acinetobactin 

7.33 

■  9 

9 

■ 

T634_2726 

Acinetobactin 

■  2.77 

B  O 

9 

T634 2727 

basG 

Acinetobactin 

i  m 

■ 

9 

i  1 

9 

T634 2728 

basF 

Acinetobactin 

■  7.02 

i  |p~ 

9 

i 

O 

T634_2729 

basE 

Acinetobactin 

^fl  8.00 

m 

■ 

9 

■ 

9 

T634 2730 

basD 

Acinetobactin 

8.07 

fl  m 

^B 

9 

■ 

9 

T634_2731 

basC 

Acinetobactin 

6.86 

■  • 

■ 

9 

□ 

:■ 

T634 2732 

bauA 

Acinetobactin 

8.20 

■  IT 

■ 

9 

T634 2733 

bauB 

Acinetobactin 

7.45 

i  r 

■ 

9 

i 

T634_2734 

bauE 

Acinetobactin 

^■1 

i 

■ 

9 

i  1 

T634 2735 

bauC 

Acinetobactin 

^B  5.35 

i  Ip 

B 

9 

i 

T634 2736 

bauD 

Acinetobactin 

i 

B 

P 

_ i _ 

T634_2737 

Acinetobactin 

1.99 

i 

■ 

T634 2738 

basB 

Acinetobactin 

8.45 

i  Ip 

9 

D 

9 

T634 2739 

basA 

Acinetobactin 

i 

■ 

P 

T634 2740 

Acinetobactin 

4.57 

1 

1 

T634_2741 

bauF 

Acinetobactin 

6.16 

i 

1 

P 

1 

EH 

T636_A1237 

fepA 

Enterobactin 

■1  6.70 

M  9 

9 

1 

T636_A1238 

[ fes 

Enterobactin 

M  9 

9 

B 

T636 A1239 

mbtH 

Enterobactin 

2.96 

M  B 

9 

■ 

T636_A1240 

entF 

Enterobactin 

7.84 

M  P 

^^B 

9 

■ 

T636 A1241 

fepC 

Enterobactin 

^P  4.39 

■  9 

9 

1 

T636 A1242 

fepG 

Enterobactin 

4.10 

■ 

■ 

: 

:  ~  VSr>.r 

T636 A1243 

fepD 

Enterobactin 

^B  4.18 

■ 

■ 

i 

T636_A1244 

entS 

Enterobactin 

■  3.61 

■ 

^B 

9 

i 

T636_A1245 

Enterobactin 

T636_A1246 

fepB 

Enterobactin 

4.36 

■ 

B 

i 

T636_A1247 

entC 

Enterobactin 

6.00 

m  r 

p 

■ 

T636_A1248 

entE 

Enterobactin 

^B  6.17 

o 

^^^fl 

9 

B 

T636_A1249 

entB 

Enterobactin 

^P  5.87 

M  o 

9 

■ 

T636_A1250 

entA 

Enterobactin 

^^P 

9 

■ 

Figure  6.  Regulation  of  siderophore  gene  cluster  expression  in  K.  pneumoniae 
(enterobactin)  and  A.  baumannii  (acinetobactin)  when  in  confrontation  with  a  challenger 
species.  Siderophore  gene  expression  response  of  K.  pneumoniae  (KP),  A.  baumannii  (ABJ,  E. 
hormaechei  (EH),  and  C.  jeikeium  (CJ).  The  MDROs  were  individually  co-cultured  and  confronted 
with  a  different  MDRO  or  the  commensal  CJ.  Differentially  regulated  genes  shown  with  EdgeR  false 
discovery  rate  (FDR):  <0.05  (pink  circle)  and  <0.01  (yellow  circle).  Log  base  2  fold  changes  (FC) 
with  reference  to  monoculture  were  shown  as  bar  graphs.  Gene  expression  level  was  shown  as 
log  base  2  counts  per  million  (CPM)  averaged  across  all  experiments. 


Differential  expression  of  siderophores 

Siderophores  are  molecules  used  by  cells  for  iron  acquisition  and  are  critical  tools  for  survival  of  bacterial 
pathogens  in  eukaryotic  hosts  where  free  iron  is  extremely  limiting.  Siderophore  gene  clusters  were  upregulated 
when  MDRO  species  were  individually  co-cultured  with  another  MDRO,  except  for  if.  pneumoniae  Enterobactin- 
producing  genes  when  confronted  with  E.  hormaechei  (Figure  6).  In  general,  an  up  regulation  of  siderophore 
expression  was  observed  across  5  out  of  6  of  the  MDRO-MDRO  pairwise  co-culture  assays  performed  (Figure 
6).  An  opposite  response  was  only  observed  whereby  the  K.  pneumoniae  siderophore  gene  cluster  was  strongly 
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down  regulated  by  close  to  8-fold  when  K.  pneumoniae  was  co-cultured  with  E.  hormaechei.  Down  regulation 
of  K.  pneumoniae  and  A.  baumannii  siderophore  gene  clusters  was  also  observed  when  these  MDROs  were 
individually  co-cultured  with  the  commensal  C.  jeikeium.  It  would  be  very  exciting  if  lowering  siderophore 
production  is  a  mechanism  whereby  a  commensal  microorganism  can  out  compete  invading  pathogens  in/on  the 
human  host.  Also  of  note,  both  K.  pneumoniae  and  E.  hormaechei  encode  Enterobactin,  but  only  K.  pneumoniae 
shuts  down  transcription  of  Enterobactin  in  the  presence  of  E.  hormaechei  and  not  vice  versa.  It  is  conceivable 
that  K.  pneumoniae  may  have  evolved  feedback  mechanisms,  in  terms  of  iron  utilization,  and  is  therefore  capable 
of  re-using  the  enterobactin  produced  by  Enterobacter;  however,  Enterobacter  is  not  able  to  do  the  same.  In 
contrast,  the  expression  of  the  A.  baumannii  siderophore  Acinetobactin  was  always  upregulated  regardless  if  the 
challenger  organism  was  K.  pneumoniae  or  E.  hormaechei. 


Table  6.  Reads  mapped  by  reference  genomes 


Confronting  Bacteria 

MDRO  ! 

A.  baumannii 

K.  pneumoniae 

E.  hormaechei 

Reads  mapped  to  A. 
baumannii  genome 

Reads  mapped  to 
confronting  bacteria 

Reads  mapped  to  K. 
pneumoniae  genome 

Reads  mapped  to 
confronting  bacteria 

Reads  mapped  to  E. 
hormaechei  genome 

Reads  mapped  to 
confronting  bacteria 

S.  epidermidis 

Replicate  1 

32,234,355 

82.5% 

509.081 

1.3% 

25,025.649 

77.5% 

42,914  0.1% 

34,150,078  81.7% 

37,189  0.1% 

Replicate  2 

32,579.207 

78.7% 

211.461 

0.5% 

23,568.809 

82.1% 

39,305  0.1% 

21,403.908  81.2% 

8,916  0.03% 

L  reuteri 

Replicate  1 

28,473.8 22 

84.9% 

18,964 

0.1% 

33,577,168 

84.5% 

1,719  0.004% 

15,585.890  72.4% 

15,034  0.1% 

Replicate  2 

27,295.015 

82.4% 

3,144 

0.01% 

25,241,273 

80.2% 

23,909  0.1% 

23,719.926  84.0% 

5,225  0.02% 

C.  jeikeium 

Replicate  1 

41,378.734 

80.2% 

16,330 

0.03% 

22,517,239 

78.3% 

1,350  0.005% 

30,987.505  77.8% 

1,183  0.003% 

Replicate  2 

24,022.732 

81.7% 

1,358 

0.005% 

31,012.045 

80.1% 

2,385  0.01% 

29,476.237  71.1% 

1,091  0.003% 

E.  hormaechei 

Replicate  1 

9,468.615 

44.9% 

8,592.602 

40.7% 

7,696.986 

32.4% 

10,578,989  44.6% 

33,613.582  86.3% 

Replicate  2 

13,758,555 

49.4% 

9,568,925 

34.3% 

13,908.025 

35.0% 

17,650,831  44.5% 

19,182.465  83.3% 

K.  pneumoniae 

Replicate  1 

16,455.140 

45.8% 

13,503.059 

37.6% 

13,603.176 

84.4% 

Replicate  2 

14,591.964 

53.0% 

8,352,266 

30.3% 

23,242.160 

81.5% 

A.  baumannii 

Replicate  1 

34,351.880 

86.6% 

Replicate  2 

30,045.911 

85.9% 

Response  of  Commensals 

Confrontations  of  MDROs  to  commensals  had  a  low  number  of  readings  belonging  to  the  commensals  therefore 
preventing  EdgeR  analysis  of  commensal  genes  (Table  6).  When  EdgeR  analysis  was  performed  for  MDRO 
genes  when  confronted  with  S.  epidermidis  very  few  genes  had  a  FDR  of  <0.05  (Table  7).  For  A.  baumannii 
confronted  with  S.  epidermidis  there  were  12  differentially  expressed  genes,  and  K.  pneumoniae  when  confronted 
with  S.  epidermidis  there  were  112  differentially  expressed  genes.  These  numbers  were  very  small  compared  to 
the  other  confrontations  and  therefore  it  was  concluded  that  the  expression  profile  of  the  MDRO  at  the  six-hour 
time  point  did  not  differ  from  monoculture  expression.  The  response  by  each  organism  is  likely  to  vary  depending 
on  the  growth  phase.  Studies  in  A.  baumannii  strain  ATCC  17978  showed  a  significant  change  in  protein 
production  when  cells  transition  from  log  phase  to  stationary  phase  [26],  Although  there  were  differences  in  the 
growth  rates  of  the  commensals  and  the  MDROs,  the  fact  that  we  still  noticed  differential  expression  of  genes  by 
MDROs  in  response  to  some  of  the  commensals,  like  S.  epidermidis,  shows  that  these  studies  are  still  valuable 
due  to  the  inherent  internal  control  nature  of  these  experiments.  In  K.  pneumoniae,  more  than  90%  of  the 
differentially  regulated  genes  are  downregulated  and  none  are  specific  to  S.  epidermidis,  and  are  classified  as 
cellular  processes,  energy  metabolism,  transport  and  binding.  Only  four  were  uniquely  upregulated,  and  were 
proteins  of  either  unknown  function,  regulatory  or  cell  envelope  categories.  None  of  the  eight  upregulated  genes 
(primarily  energy  metabolism)  in  A.  baumannii  were  unique  to  S.  epidermidis,  but  of  the  three  genes  (all 
chaperones,  GroL,  GroS  and  DnaK)  downregulated,  GroL  and  DnaK  were  specific  for  S.  epidermidis. 

Table  7.  Number  of  differentially  expressed  gene  in  E.  hormaechei,  K.  pneumoniae,  and  A.  baumannii  when 
confronted  with  other  bacteria.  Differentially  expressed  genes  are  defined  as  genes  with  an  FDR  <0.05 
from  EdgeR  analysis. _ 


Confronting  Bacteria 

A.  baumannii 

K. 

pneumoniae 

E. 

hormaechei 

C.  jeikeium 

S. 

epidermidis 

L.  reuteri 

A. 

genes 

baumannii 

388 

322 

424 

11 

1 

K. 

genes 

pneumoniae 

909 

212 

164 

95 

14 

17 


E.  hormaechei 


genes 


78 


15 


0 


4 
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Confirmation  of  differential  expression  by  qPCR 

RNA-Seq  data  for  the  bacterial 
confrontations  was  validated  by  qPCR 
of  select  differentially  expressed  genes. 

Genes  were  chosen  for  analysis  based  on 
the  ability  to  design  unique  primers  and 
their  logFC  values.  Biological 
replicates  were  both  tested  and  qPCR 
was  performed  on  cDNA  from  reverse 
transcription  of  total  RNA.  For  K. 
pneumoniae,  2  downregulated  {acnB, 

T643_A0266;  gabTl,  T643_A4354) 
and  3  upregulated  (SIS  domain  protein, 

T643_A0084;  cimH,  T643_A0191; 
hypothetical  protein,  T643_A0079) 
genes  were  tested.  For  A.  baumannii,  3 
downregulated  (PapD,  T6342490; 
mmsA_  1,  T6340140;  EamA-like 

transporter,  T6343500)  and  4 
upregulated  (PilN,  T634  3585;  NAD 
family  protein,  T634_2358;  Hpt  domain  protein,  T634  3235;  hypothetical  protein,  T6340987)  genes  were 
tested.  Gene  expression  was  normalized  to  gyrB  and  relative  expression  ratios  were  determined  by  comparing 
co-culture  samples  with  monoculture  samples.  Relative  expression  ratios  were  converted  to  logFC  and  compared 
to  the  logFC  from  RNA-Seq  data.  All  genes  tested  showed  the  same  direction  of  logFC  in  both  RNA-Seq  and 
qPCR  analysis  (Figure  7),  confirming  the  trends  observed  in  the  RNA-Seq  analysis. 

Conclusions  for  bacteria-bacteria  confrontations 

The  most  prominent  results  from  this  study  were- 1)  The  response  of  A.  baumannii  to  the  two  MDROs  (E. 
hormaechei  and  K.  pneumoniae )  and  the  commensal  C.  jeikeium  was  significantly  stronger  compared  with  the 
response  to  S.  epidermidis,  2)  K.  pneumoniae  raised  a  more  robust  response  to  A.  baumannii  than  vice  versa  and 
responds  in  a  comparable  manner  to  the  three  other  organisms.  The  unique  reactions,  depending  on  the  type  of 
organism  in  the  co-culture,  clearly  shows  the  importance  of  confrontation  type  experiments  to  understand 
response  and  regulation  of  organisms  during  infections.  It  is  evidence  towards  our  hypothesis  that  no  single 
genetic  response  by  an  organism  would  be  an  effective  means  for  dealing  with  every  possible  microbial  encounter. 

In  our  targeted  analysis  of  the  RNA-Seq  of  the  MDROs  response  to  other  MDROs  and  the  commensals, 
we  found  some  differential  regulation  of  factors  associated  with  virulence  and  upregulation  of  genes  that  might 
confer  a  growth  advantage  over  the  confronting  species,  indicating  that  these  types  of  interactions  can  influence 
the  pathogen’s  response  during  infection.  Upregulation  of  some  genes  in  response  to  other  organisms  in  the 
microbiome  might  also  prime  the  pathogen  to  be  prepared  for  antibiotic  treatment  through  global  regulation  of 
some  stress  response. 

This  work  is  one  of  the  few  studies  of  bacterial-bacterial  confrontation  observing  differential 
transcriptional  regulation  via  RNA-Seq  analysis.  Using  RNA-Seq  we  were  able  to  identify  unique  matches  in 
each  participant  and  simultaneously  map  both  upregulated  and  downregulated  genes.  Many  proteins  of  unknown 
function  and  non-coding  RNAs  were  differentially  regulated,  suggesting  a  need  for  deeper  genomics  analysis  to 
unravel  the  yet  unknown  factors  influencing  infection  process  and  drug  resistance.  This  study  highlights  the  fact 
that  an  organism  responds  differently  to  each  source  of  stimulus  in  the  environment,  and  knowing  this  response 
can  help  comprehend  the  influence  of  surrounding  stimuli  on  a  pathogen’s  ability  to  cause  disease.  This  type  of 
analysis  will  have  extensive  value  in  studying  pathogenesis  in  terms  of  both  host-pathogen  and  pathogen- 


Figure  7.  Validation  of  DE  genes.  Lines  indicated  logFC  expression  from  EdgeR  analysis 
of  RNA-Seq  data  and  bars  indicate  logFC  expression  as  measured  by  qPCR  where  blue 
bars  are  K.  pneumoniae  genes  and  gray  bars  are  A.  baumannii  genes. 
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microbiome  interactions,  and  can  be  particularly  beneficial  to  observe  infections  where  the  pathogen  is  not  in 
isolation  such  as  in  wound  infections. 

Aim  2.  Months  12-24.  Perform  bacterial-eukaryotic  confrontation  assays  to  monitor  bacterial-host  cell 
interactions  between  MDROs  and  human  skin  cell  cultures. 

Task  3a.  Months  12-13.  Primary  human  skin  cell  culture.  Obtain  commercially  available  cryo-preserved  cell 
culture  stock  and  establish  primary  culture  for  adult  human  skin  cells.  [Task  COMPLETED] 

Frozen  cell  lines  for  adult  human  dermal  fibroblasts  (HDFa)  and  adult  human  epidermal  keratinocytes  (HEKa) 
were  obtained  from  Invitrogen,  catalog  numbers  C0135C  and  C0055C,  respectively.  Cells  were  expanded  and 
frozen,  but  consistent  recovery  of  human  dermal  fibroblasts  was  the  best  and  deemed  most  suited  to  conduct 
confrontation  experiments. 


Task  3b.  Months  13-19.  Bacterial-eukaryotic  confrontation  assay.  Propagate  bacterial  isolates  on  solid  media. 
Set  up  biological  replicates  representing  pairwise  bacterial-eukaryotic  co-culture  between  individual  MDROs 
and  primary  human  skin  cell  cultures.  Set  up  controls  with  MDROs  under  the  same  tissue  culture  conditions  in 
the  absence  of  human  skin  culture.  Harvest  bacterial  and  human  skin  cells  from  co-culture  assays.  Repeat  Task 
3b  for  experimental  replicates.  [Task  COMPLETED] 

Subtask.  Carry  out  pilot  experiments  to  determine  sub-lethal  infection  (multiplicity)  ratio  of  bacterial  and 
human  cells  for  co-culture  experiments.  [Subtask  COMPLETED] 

HDF  cell  cultures  were  set-up  to  test  multiplicity  of  infection  (MOI)  as  well  as  incubation  times  for  individual 
MDROs.  The  MOIs  for  all  three  MDROs  with  HDF  cells  was  determined  to  be  optimal  at  100:1  (MDRO:HDF). 
The  length  of  confrontation  was  tested  whereby  one-hour  duration  was  determined  to  be  optimal.  Prolonged 
confrontation  times  resulted  in  acidification  of  the  cell  culture  media,  which  was  avoided  to  eliminate  interference 
with  gene  expression  due  to  pH  changes  and  direct  interactions  among  the  MDROs. 

Subtask.  Determine  that  bacterial  isolates  adhere  to  human  cells  prior  to  isolation  of  RNA.  [Subtask 
COMPLETED] 

Bacterial  retention  control  experiments  were  repeated  and  the  MDROs  were  observed  via  light  microscopy  to 
be  adherent  to  the  HDF  cells,  even  after  3  rinses  with  PBS  (data  not  shown).  Cells  were  isolated  for  RNA 
extraction  from  confrontations  after  removing  excess  culture  media  so  only  adherent/close  proximity  bacterial 
cells  were  harvested.  Cells  were  then  treated  with  RNAprotect  Cell  Reagent  (Qiagen)  to  immediately  stabilize 
RNA  transcripts  before  further  processing. 

Bacterial-HDF  confrontations 

Three  MDROs  ( Acinetobacter  baumannii  MRSN  7339,  Klebsiella  pneumoniae  MRSN  1319,  and  Enterobacter 
hormaechei  MRSN  1 1489)  were  confronted  with  Human  Dermal  Fibroblasts,  adult  (HDFa)  (C-013-5C,  Gibco). 

HDFa  Culture  Growth 

Primary  HDFa  were  expanded  in  4  x  25  cm2  flasks  in  Medium  106  (M- 106-500,  Gibco)  supplemented  with 
Low  Serum  Growth  Supplement  (LSGS)  (S-003-10,  Gibco)  and  Gentamicin/ Amphotericin  B  (50-0640,  Gibco). 
Cells  were  incubated  at  37°C,  5%  CO2,  90%  RH.  Media  was  changed  every  2  days  until  cells  reached  80% 
confluence  (6  days),  at  which  point  they  were  harvest  with  0.05%  Trypsin-EDTA  (25300-054,  Gibco).  Trypsin 
was  neutralized  with  Trypsin  Neutralizer  Solution  IX  (R-002-100,  Gibco)  and  media.  Cells  were  split  into  4  x 
75  cm2  flasks  and  incubated  at  37°C,  5%  CO2,  90%  RH  for  their  second  passage.  Media  was  changed  at  24  h 
and  then  every  48  hours  until  at  90%  confluence  (6  days)  at  which  point  they  were  harvested  with  Trypsin 
again.  Cells  were  then  spun  down  at  180x  g  for  10  min  and  the  media  removed  and  cells  resuspended  in  Synth- 
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a-freeze  (A12542-01,  LifeTechnologies).  Resuspended  cells  were  at  a  concentration  of  9.6  x  105  viable  cells/ml 
and  split  into  1  ml  aliquots  were  gradually  cooled  to  -80°C  and  stored  in  liquid  nitrogen  (vapor  phase)  storage. 

For  confrontations  an  aliquot  of  twice  passaged  HDFa  cells  were  thawed  in  37°C  water  bath  and  expanded  in  a 
75  cm2  flask  with  Medium  106  (M-106-500,  Gibco)  supplemented  with  Low  Serum  Growth  Supplement 
(LSGS)  (S-003-10,  Gibco)  at  37°C,  5%  CO2,  90%  RH.  Media  was  changed  at  24  h  and  then  every  48  h  until 
cells  reached  80%  confluence  at  which  point  cells  were  harvested  with  trypsin  as  described  above  and  split  into 
25  cm2  flasks.  Media  was  changed  at  24  h  and  every  48  hours  after  that.  When  cells  were  80%  confluent  media 
was  changed  16  h  before  being  confronted  with  the  MDROs. 

A.  baumannii,  K.  pneumonia,  and  E.  hormaechei  cells  were  grown  in  BHI  broth  (BD)  to  a  concentration  of  1  x 
108  CFU/ml  in  log  phase  growth.  0.5  ml  of  each  culture  was  spun  down  for  3  min  at  10k  x  RCF,  BHI  was 
removed  and  cells  were  resuspended  in  Media  106  +  LSGS.  Bacteria  was  added  at  an  MOI  of  100: 1 
(bacteria:HDF)  to  the  25  cm2  flask  of  HDFa  and  incubated  for  1  h  at  37°C,  5%  CO2,  90%  RH.  Bacteria  only 
controls  were  also  added  to  empty  25  cm2  flasks  in  cell  culture  media  and  incubated  for  the  same  time  in  the 
same  conditions. 

RNA  Isolation.  Total  RNA  was  isolated  using  Proteinase  K  in  conjunction  with  lysozyme  to  lyse  cells  instead 
of  bead  beating,  followed  by  Trizol/chloroform  extraction.  RNA  was  cleaned  using  a  Qiagen  RNeasy  Kit  and 
then  residual  gDNA  was  removed  by  TURBO  DNase  treatment  as  was  done  previously  for  bacteria-bacteria 
confrontation  RNA  preparations. 

Task  3c.  Months  19-24.  Bacterial-host  RNA-Seq  and  gene  expression  analysis.  Isolate  human  polyA  mRNA 
from  samples  to  collect  transcripts  derived  from  human  skin  cells.  Isolate  bacterial  mRNA  from  samples  by 
depletion  of  human  mRNA  and  bacterial  and  eukaryotic  rRNAs  to  collect  transcripts  derived  from  the  MDRO 
isolates.  Combine  bacterial  and  human  mRNA,  and  construct  barcoded  ssRNA-Seq  libraries  for  Illumina 
sequencing.  [Task  COMPLETED] 

rRNA  Depletion  and  Library  Construction.  Prokaryotic  and  Eukaryotic  rRNAs  were  depleted  prior  to 
library  construction  using  the  Ribo-Zero  Gold  rRNA  Removal  Kit  (Epicentre).  Fourteen  directional  RNA-Seq 
Illumina  libraries  were  generated  using  NEBNext  Ultra  RNA  Directional  Library  Preparation  Kit.  Libraries 
were  run  on  a  total  of  4  lanes  of  Illumina  HiSeq  2000  Paired  End  Sequencing  (2xl00bp).  Sequencing  reads 
were  binned  and  mapped  to  human  and  bacterial  references  genomes  as  described  for  bacteria-bacteria 
confrontations  except  where  noted  in  the  results. 

Bacterial  responses  to  HDFa  cells 

Transcriptome-wide  gene  responses  were  studied  in  confrontation  assays  when  MDROs  were  individually  co¬ 
cultured  with  cultured  HDFa  cells.  Two  gene  clusters  in  K.  pneumoniae  were  specifically  down  regulated  when 
K.  pneumoniae  was  co-cultured  with  HDFa  cells.  The  down  regulated  K.  pneumoniae  gene  clusters  were 
involved  in  fimbriae  production  and  capsule  synthesis  (Figure  8).  Both  of  these  bacterial  products  carry 
Pathogen- Associated  Molecular  Patterns  (PAMPs),  which  are  bacterial  products  that  are  recognized  by  the 
human  innate  immune  system  [27,  28],  We  observed  similar  downregulations  in  the  A.  baumannii  Csu  and 
Type  IV  pili  gene  clusters  and  an  E.  hormaechei  fimbrial  operon,  but  the  results  were  not  below  the  0.05  FDR 
significance  cutoff  (data  not  shown). 
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(A)  Fimbriae  production 
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(B)  Capsule  biosynthesis 
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Figure  8.  K.  pneumoniae  (KP)  differentially  expressed  genes  when  confronted  with  HDF  cells. 

Expression  of  KP  fimbriae  production  (A)  and  capsule  biosynthesis  (B)  gene  clusters  in  response  to 
co-cultured  HDF  cells.  Differentially  regulated  genes  shown  with  EdgeR  false  discovery  rate  (FDR): 
<0.05  (pink  circle)  and  <0.01  (yellow  circle).  Log  base  2  fold  changes  (FC)  with  reference  to 
monoculture  were  shown  as  bar  graphs.  Gene  expression  level  was  shown  as  log  base  2  counts  per 
million  (CPM)  averaged  across  all  experiments. 

genes  can  be  mapped  to  the  NOD-like  receptor  signaling  pathway  in  KEGG 


HDFa  responses  to  MDROs 

Analysis  of  HDFa-derived  reads 
revealed  a  set  of  21  human  genes 
that  were  up-regulated  and  shared 
across  all  three  MDRO-skin  co¬ 
culture  assays.  Pathway  and  GO 
term  enrichment  analyses  of  the 
gene  list  using  the  DAVID 
Bioinformatics  Resources 
Database  [29,  30]  showed  that  the 
shared  gene  set  was  enriched  in 
cytokine  and  chemokine  signaling 
pathways,  and  inflammatory 
responses  (Table  8).  Out  of  the  21 
upregulated  genes,  a  total  of  9 
(Figure  9). 


Table  8.  Pathway  and  GO  enrichment  analysis  of  a  shared  human  gene  set  upregulated  by  MDROs  K. 
pneumoniae,  A.  baumannii ,  and  E.  hormaechei. 


(A)  Top  3  enriched  pathways 


Pathway 

p-value 

Gene 

matches 

NOD-like  receptor  signaling  pathway 

1.018200e-13 

9 

Cytokine-cytokine  receptor  interaction 

5.642554e-6 

8 

Chemokine  signaling  pathway 

1.831407e-5 

7 

(B)  Top  3  enriched  GO  terms  (biological  processes) 


GO  term 

p-value 

Gene 

matches 

response  to  lipopolysaccharide  [G0:0032496] 

1.949874e-10 

11 

response  to  molecule  of  bacterial  origin  [GO: 000223 7] 

3.102596e-10 

11 

inflammatory  response  [G0:0006954] 

9.053733e-9 

12 

Figure  9.  Mapping  of  human  upregulated  genes  in  MDRO-HDFa  confrontation  assays.  The  KEGG  NOD-like  receptor  signaling 
pathway  is  illustrated  with  the  nine  cytokine  and  chemokine  genes  that  were  up  regulated  during  bacterial  confrontations  (red  star). 
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KEY  RESEARCH  ACCOMPLISHMENTS 

•  Generated  high  quality  Illumina  HiSeq  genome  reference  assemblies  and  annotations  for  25  MDROs  of 
highest  military  importance. 

•  Developed  a  novel  graph-based  algorithm  and  used  it  to  assemble  the  first  consensus  pan-chromosome 
of  A.  baumannii,  identifying  both  the  order  and  orientation  of  core  genes  and  flexible  genomic  regions, 
enabling  us  to  identify  gene  clusters  responsible  for  carbon  utilization,  siderophore  production,  and  pilus 
assembly  demonstrating  frequent  gain  or  loss  among  isolates. 

•  Demonstrated  the  existence  of  novel  resistance  islands  and  isolates  with  increased  numbers  of  resistance 
island  insertions  over  time,  from  single  insertions  in  the  1950s  to  triple  insertions  in  201 1. 

•  Conducted  MDRO-MDRO  and  MDRO-commensal  confrontation  assays  that  demonstrated  differential 
expression  of  genes  involved  in  cell  surface  attachment/biofilm  (e.g.,  pili),  iron  scavenging  (e.g., 
siderophore)  and  stress  response  pathways  (e.g.,  universal  stress  family  proteins,  BhsA). 

•  Conducted  MDRO-human  cell  confrontation  assays  that  showed  down  regulation  of  bacterial 
attachment  proteins  and  an  increase  in  the  expression  of  human  innate  defenses. 

REPORTABLE  OUTCOMES 

•  A  set  of  annotated  genomes  of  25  MDROs  of  military  importance  submitted  to  NCBI  GenBank  and  shared 
with  MRSN  collaborators. 

•  A  set  of  highly  variable  genomic  regions  including  K  and  O  antigen  loci,  which  control  virulence  and 
antigenicity,  were  identified  based  on  analysis  across  over  240  military  and  civilian  A.  baumannii  isolates. 

•  A  comprehensive  catalog  of  genomic  features  representing  antibiotic  resistance  islands  (RIs)  and 
virulence  factors  (e.g.  siderophores,  cell  surface  attachments/biofilm,  etc.)  for  seven  A.  baumannii  strains 
isolated  from  US  military  personnel  and  provided  by  WRAIR  MRSN. 

•  A  group  of  differentially  regulated  genes  when  MDROs  are  in  confrontation  with  other  MDROs, 
commensal  bacteria,  or  adult  human  dermal  fibroblasts. 

•  A  database  of  antibiotic  resistance  genes  based  on  manually  curated  sets  of  proteins  described  in  the 
literature.  This  enabled  the  identification  of  several  antibiotic  resistance  genes  previously  missed  by  the 
public  CARD  database. 

•  Based  on  our  pan-chromosome  and  variable  region  gene  content  identification  algorithms  supported  by 
this  award,  we  applied  for  the  Department  of  Energy  Joint  Genome  Institute  (DOE  JGI)  Emerging 
Technologies  Opportunity  Program  (ETOP)  2015  award. 

•  We  also  sought  funding  from  NIAID  to  study  the  spread  of  AMR  genes  within  the  microbiome, 
leveraging  databases  of  AMR  genes  generated  based  on  work  supported  by  this  award. 

•  Both  Jessica  DePew  and  Radha  Krishnakumar  received  employment  opportunities  based  on  experience 
and  training  in  part  supported  by  this  award. 

CONCLUSION 

Our  project  goal  was  to  identify  and  characterize  factors  that  influence  virulence  (including  antibiotic 
resistance,  and  biofilm  formation)  in  multidrug-resistant  organisms  (MDROs),  specifically  microbial  or  host- 
derived  signals  that  induce  the  expression  of  virulence-associated  genes.  We  were  able  to  demonstrate  that  two 
critical  pathways  for  pathogenesis,  siderophores  (for  iron  sequestration)  and  pili  (for  attachment  and  biofilm 
formation)  are  reduced  in  expression  in  MDROs  by  a  commensal  organism  and  by  cultured  HDFa  cells.  We 
also  showed  that  these  same  factors  vary  among  A.  baumannii  isolates,  especially  military  isolates,  indicating 
high  importance  and  perhaps  selection  in  this  pathogen.  The  tools  and  knowledge  gained  from  pan-genome 
analysis  and  co-culture  expression  data  will  enhance  the  detection,  characterization,  and  reporting  of  multidrug- 
resistant  organisms  in  the  Military  Health  System.  This  research  will  ultimately  enhance  the  knowledge 
products  the  MRSN  provides  to  its  customers  and  stakeholders,  and  may  lead  to  future  drugs  to  be  used  in 
conjunction  with  antibiotic  therapies  that  prevent  the  expression  of  drug  resistance  genes.  Importantly,  we  have 
provided  evidence  that  the  expression  of  siderophores  and  pili/fimbriae  can  be  regulated  by  factors  expressed  by 
a  commensal  microorganism  or  HDFa  cells.  Future  efforts  should  be  focused  on  identifying  these  factors  so  that 
a  potential  therapeutic  can  be  developed  that  will  weaken  the  pathogens  ability  to  obtain  iron  and  to  colonize  the 
wound.  Likewise,  future  studies  should  investigate  C.  jeikeium  as  a  potential  probiotic  to  prevent  wound 
infections. 
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A  novel  method  of  consensus  pan¬ 
chromosome  assembly  and  large-scale 
comparative  analysis  reveal  the  highly  flexible 
pan-genome  of  Acinetobacter  baumannii 
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Abstract 

Background:  Infections  by  pan-drug  resistant  Acinetobacter  baumannii  plague  military  and  civilian  healthcare 
systems.  Previous  A.  baumannii  pan-genomic  studies  used  modest  sample  sizes  of  low  diversity  and  comparisons  to 
a  single  reference  genome,  limiting  our  understanding  of  gene  order  and  content.  A  consensus  representation  of 
multiple  genomes  will  provide  a  better  framework  for  comparison.  A  large-scale  comparative  study  will  identify 
genomic  determinants  associated  with  their  diversity  and  adaptation  as  a  successful  pathogen. 

Results:  We  determine  draft-level  genomic  sequence  of  50  diverse  military  isolates  and  conduct  the  largest 
bacterial  pan-genome  analysis  of  249  genomes.  The  pan-genome  of  A.  baumannii  is  open  when  the  input 
genomes  are  normalized  for  diversity  with  1867  core  proteins  and  a  paralog-collapsed  pan-genome  size  of 
1 1,694  proteins.  We  developed  a  novel  graph-based  algorithm  and  use  it  to  assemble  the  first  consensus 
pan-chromosome,  identifying  both  the  order  and  orientation  of  core  genes  and  flexible  genomic  regions.  Comparative 
genome  analyses  demonstrate  the  existence  of  novel  resistance  islands  and  isolates  with  increased  numbers  of 
resistance  island  insertions  over  time,  from  single  insertions  in  the  1950s  to  triple  insertions  in  201 1.  Gene  clusters 
responsible  for  carbon  utilization,  siderophore  production,  and  pilus  assembly  demonstrate  frequent  gain  or  loss 
among  isolates. 

Conclusions:  The  highly  variable  and  dynamic  nature  of  the  A.  baumannii  genome  may  be  the  result  of  its 
success  in  rapidly  adapting  to  both  abiotic  and  biotic  environments  through  the  gain  and  loss  of  gene  clusters 
controlling  fitness.  Importantly,  some  archaic  adaptation  mechanisms  appear  to  have  reemerged  among  recent 
isolates. 


Background 

Acinetobacter  baumannii  is  a  Gram-negative,  non¬ 
fermenting  coccobacillus  that  can  be  found  in  soil  and 
water,  but  in  recent  decades  has  been  recognized  as  an 
emerging  multidrug- resistant  (MDR)  nosocomial  patho¬ 
gen  causing  pneumonia,  bacteremia,  meningitis,  and  skin/ 
soft- tissue  infection  associated  with  trauma  [1-5].  The 
Centers  for  Disease  Control  and  Prevention  (CDC) 
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estimates  that  each  year  in  the  US  there  are  12,000 
healthcare-associated  Acinetobacter  infections,  63  %  of 
which  are  MDR  [6].  In  2010  an  expert  panel  deemed 
MDR  organisms  one  of  the  top  five  infectious  threats  to 
the  US  Military  [7].  Infections  with  A.  baumannii  resistant 
to  nearly  every  available  antibiotic  complicate  the  care  of 
many  patients  [8,  9].  Surveillance  for  asymptomatic 
colonization  among  injured  service  members  reveals  A, 
baumannii  to  be  one  of  the  common  Gram-negative 
MDR  pathogens  isolated  along  with  Acinetobacter  calcoa- 
ceticus  and  Klebsiella  pneumoniae  [10]. 

The  genetic  factors  that  contribute  to  the  success  of 
A.  baumannii  as  a  pathogen,  such  as  biofilm  formation, 
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ability  to  compete  for  and  sequester  iron  in  nutrient- 
deprived  environments,  and  resistance  to  multiple  broad- 
spectrum  antibiotics,  have  been  areas  of  intense  study.  In 
a  recently  published  study  of  97  clinical  isolates  collected 
from  military  treatment  facilities,  80  %  were  found  to  be 
MDR  with  markers  known  to  confer  resistance  to  13- 
lactams,  aminoglycosides,  macrolides,  tetracycline,  phe- 
nicol,  quaternary  amines,  streptothricin,  sulfonamides, 
and  diaminopyrimidine  [11].  Drug  resistance  is  manifested 
by  a  number  of  well-characterized  mechanisms,  including 
inactivation  of  drugs  (e.g.,  (3-lactamases,  cephalospori- 
nases,  carbapenemases),  prevention  of  drug  entry  through 
outer  membrane  alterations,  removal  of  the  drugs  via  ef¬ 
flux  pumps,  and  mutations  in  drug  targets  [12-18].  In 
addition,  A.  baumannii  has  the  capacity  to  up-regulate 
expression  of  resistance  mechanisms  [19-24]  and  acquire 
new  determinants  on  genomic  regions  called  resistance 
islands  (RIs)  [25],  especially  in  environments  such  as  hos¬ 
pitals  where  broad  spectrum  antibiotics  are  in  use  [26] . 

Previous  A.  baumannii  comparative  genomics  studies 
used  modest  sample  sizes  to  study  representative 
strains  causing  infections  worldwide.  Di  Nocera  et  al. 
[27]  compared  seven  A.  baumannii  strains,  including 
three  of  the  most  frequent  strains  responsible  for 
epidemics  in  Mediterranean  hospitals.  Sahl  et  al.  [28] 
compared  23  isolates,  including  three  they  sequenced, 
for  the  presence/absence  of  invasion-  and  colonization- 
specific  genes  and  conducted  a  pan-genome  analysis  of 
six  complete  genomes.  Whole  genome  phylogenetic 
analysis  of  136  Acinetobacter  genomes  was  used  to  shed 
light  on  the  expansion  of  the  genus  occurring  through 
the  gain  and  loss  of  genes  and  conservation  of  pathogen¬ 
esis  associated  genes  in  the  Acinetobacter  calcoaceticus- 
baumannii  complex  [29].  Recently,  pan-genome  analysis 
on  34  [30]  and  35  [31]  A.  baumannii  isolates  was 
conducted. 

Since  the  use  of  a  single  reference  genome  would  limit 
our  understanding  of  gene  order  and  content  to  a  single 
isolate,  comparisons  with  all  available  related  genomes 
would  be  preferable.  Thus,  a  consensus  representation  of 
multiple  genomes  would  provide  a  better  framework  for 
comparison  than  a  single  reference  genome.  Methods 
for  constructing  the  consensus  of  bacterial  strains  do 
not  yet  exist  as  far  as  we  know;  however,  methods  do 
exist  to  reconstruct  contiguous  regions  of  ancestral 
eukaryotic  genomes  based  on  evolutionary  breakpoints 
or  rearrangements  [32-34].  These  methods  would  fail  to 
assemble  a  consensus  prokaryotic  genome  by  not  cap¬ 
turing  variable  regions  acquired  via  horizontal  gene 
transfer  events  that  were  nonexistent  in  the  ancestor.  In 
addition,  methods  that  rely  on  rearrangements  will  not 
work  with  draft  genomes.  These  limitations  necessitated 
the  development  of  a  new  program,  gene_order.pl ,  which 
computes  the  consensus  pan-genome  from  the  output 


generated  by  our  pan-genome  ortholog  clustering  tool, 
PanOCT  [35]. 

Here  we  compare  genomic  features  from  the  largest 
number  of  A.  baumannii  isolates  of  clinical  and  military 
relevance  using  a  pan-genome  analysis  of  249  publicly 
available  A.  baumannii  isolates,  of  which  50  were  se¬ 
quenced  at  the  J.  Craig  Venter  Institute  (JCVI)  for  this 
study.  The  249  isolates  were  collected  over  several  de¬ 
cades  and  also  represented  a  global  collection  obtained 
from  hospitals  in  the  US  and  around  the  world.  First, 
using  gene_order.pl  as  described  above,  we  assembled 
the  first  consensus  “pan-chromosome”  independent  of 
any  pre-assigned  genome  reference  and  identified  both 
invariant  (core)  and  variable  (flexible)  regions  within  the 
chromosome,  which  are  key  components  that  define  a 
bacterial  strain.  Second,  we  utilized  a  comparative  gen¬ 
omics  approach  on  249  genomes  to  analyze  the  diversity 
of  RIs  and  virulence  factors  of  A.  baumannii.  Our  results 
revealed  that  decades-old  isolates  already  encoded  a  vast 
collection  of  genetic  determinants  and  mechanisms  to 
confer  antibiotic  resistance  and  survival  adaptations.  We 
demonstrated  the  existence  of  novel  RIs  and  isolates  with 
increased  number  of  RI  insertions  over  time.  Clusters  of 
genes  for  carbon  source  utilization,  siderophore  produc¬ 
tion,  pilus  assembly  and  resistance  mechanisms  were 
highly  variable,  and  some  of  these  may  have  reemerged, 
sometimes  in  different  genomic  locations,  among  modern 
isolates.  These  analyses  will  provide  insight  into  the  evolu¬ 
tion  of  A.  baumannii  as  a  nosocomial  pathogen  and  dir¬ 
ectly  aid  the  future  efforts  for  large-scale  epidemiological 
studies  of  this  continuously  evolving  MDR  organism. 

Results 

Genome  sequencing  of  new  A.  baumannii  isolates  from 
the  military  healthcare  system 

A  total  of  50  isolates  identified  as  A.  baumannii  from  the 
US  military  healthcare  system  were  chosen  for  whole  gen¬ 
ome  shotgun  sequencing  based  on  novel  clustering  by 
pulsed-field  gel  electrophoresis  (PFGE;  Additional  file  1), 
increased  prevalence  in  the  military  healthcare  system,  or 
pan-drug  resistance  profiles  (e.g.,  Multidrug-resistant 
Organism  and  Surveillance  Network  (MRSN)  isolates; 
Table  1;  Additional  file  2).  These  strains  were  isolated 
between  2003  and  2011  and  comprised  23  different 
known  sequence  types  (STs)  from  multilocus  sequence 
typing  (MLST)  with  one  potentially  novel  predicted  ST. 
Seventeen  of  the  isolates  were  sequenced  with  a  genome 
finishing  status  of  “improved  high-quality  draft”  (IHQD) 
(Table  1;  Additional  file  2),  which  included  manual  finish¬ 
ing  through  sequence  gap  closure,  PCR  to  link  physical 
ends,  or  automated  gap  closure.The  remaining  isolates 
were  sequenced  to  a  “high-quality  draff”  (HQD)  status. 
On  average,  the  genomes  assembled  into  65  contigs  (range 
3  to  197),  4,023,048  bp  in  length  (range  3,740,684  to 
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4,454,613  bp)  with  3885  predicted  protein-coding  se¬ 
quences  (range  3505  to  4406).  Antibiotic  susceptibility 
profiles  and  predicted  resistance  mechanisms  are  pre¬ 
sented  in  Additional  file  3.  For  one  isolate,  Naval-83,  an 
amino  acid  substitution  previously  not  observed  in  Acine- 
tobacter  (Glu88Lys)  was  identified  in  parC ,  which  was 
recently  shown  to  confer  resistance  to  levofloxacin  in 
Haemophilus  influenza  [36]. 

Pan-genome 

Despite  the  intensive  effort  to  characterize  A.  baumannii 
and  the  sizable  number  of  whole  genome  comparisons 
published  in  the  past  decade  [26,  29,  37-40],  the  size  of 
the  pan-genome  remains  unknown.  We  set  out  to  deter¬ 
mine  the  pan-genome  of  A.  baumannii.  Using  PanOCT 
[35],  a  total  of  22,281  orthologous  protein  clusters  were 
identified  from  a  collection  of  all  A.  baumannii  genomes 
publicly  available  at  the  time  of  the  analysis,  which  in¬ 
cluded  50  sequenced  in  this  study  plus  199  genomes  ob¬ 
tained  from  GenBank,  totaling  249  genomes  (Additional 
files  4  and  5). 

PanOCT  only  includes  non-paralogs  in  clusters  and 
uses  conserved  gene  neighborhood  to  separate  duplicated 
genes.  This  means  that  insertion  sequence  (IS)  elements 
that  are  in  novel  contexts  will  often  form  singleton  clus¬ 
ters  even  though  they  are  identical  in  sequence  to  other  IS 
elements  within  or  between  genomes  analyzed.  When  the 
“core”  pan-genome  is  defined  to  be  all  249  genomes  an¬ 
alyzed  (100  %),  there  were  1867  core/universal  protein 
clusters  and  10,602  singleton  clusters  (i.e.,  clusters  with 
a  single  member  from  a  single  genome)  (Fig.  la).  If  the 
core  pan-genome  were  instead  defined  as  clusters 
having  protein  members  from  95  %  or  75  %  of  the  ge¬ 
nomes  analyzed,  the  core  pan-genome  would  be  2833 
and  3126,  respectively. 

For  the  analysis  of  pan-genome  size,  we  followed  the 
convention  of  merging  clusters  of  paralogous  proteins, 
which  greatly  reduced  the  number  of  clusters  from 
22,281  to  11,694.  To  predict  the  theoretical  maximum 
pan-genome  size  (i.e.,  the  total  number  of  genes,  includ¬ 
ing  core/universal,  novel/unique/strain-specific  and  per¬ 
iphery/dispensable  genes)  a  pan-genome  model  was 
implemented  using  medians  and  an  exponential  decay 
function  [41]  (Fig.  lb).  The  maximum  pan-genome  size 
was  estimated  to  be  12,554  ±  65  genes.  To  determine 
whether  the  A.  baumannii  pan-genome  is  open  or 
closed,  the  number  of  new  genes  identified  (i.e.,  unique 
or  strain-specific  genes)  for  each  genome  added  was 
determined  and  fit  to  a  power  law  function  (n  =  i<N'a)  as 
described  previously  [42]  (Fig.  lb).  Conceptually,  a  pan¬ 
genome  is  closed  when  sequencing  the  genomes  of 
additional  isolates  fails  to  expand  the  pan-genome  (i.e., 
the  entire  gene  repertoire  has  been  discovered)  [43].  The 
exponent  (a)  indicates  whether  the  pan-genome  is 


open  (a  <  1)  or  closed  (a  >  1)  [41].  Using  this  equa¬ 
tion,  the  pan-genome  of  A.  baumannii  appears  to  be 
barely  closed  (a  =  1.03  ±  0.004;  Fig.  lb).  For  each  genome 
added,  the  number  of  new  genes  was  extrapolated  by 
calculating  tg(0)  (from  an  exponential  decay  function), 
which  was  determined  to  be  7  ±  0.4  (Fig.  lb). 

Since  a  large  number  of  the  A.  baumannii  isolates  in¬ 
cluded  in  this  study  were  of  MLST  ST  2  (Additional  file 
4),  it  is  possible  the  results  of  the  pan-genome  state  (i.e., 
open  versus  closed)  were  biased  toward  this  dominant  ST. 
Using  a  phylogenetic  tree  computed  from  the  BLAST 
score  ratio  (BSR)  distance  matrix  generated  by  PanOCT 
(Additional  file  6),  100  genomes  were  selected  by  hierarch¬ 
ical  clustering  (gold  label,  Additional  file  6).  This  set  of 
100  genomes,  which  represents  an  even  distribution  of 
A.  baumannii  genomic  diversity,  had  a  theoretical  max¬ 
imum  pan-genome  size  larger  than  the  combined  249 
dataset  (by  -3000  genes),  with  15,597  ±  173  genes  and 
57  ±  1.5  new  genes  discovered  for  each  genome  added 
(Fig.  lc).  The  pan-genome  of  the  diverse  100  genomes 
was  also  open  (a  =  0.63  ±  0.003;  Fig.  lc).  In  contrast, 
the  theoretical  maximum  pan-genome  size  obtained  from 
just  the  ST  2  genomes  decreased  to  7980  ±  68  genes 
and  the  ST  2  pan-genome  was  closed  (a  =  1.08  ±  0.002; 
Additional  file  7). 

Flexible  genomic  islands 

Genomic  variations  among  bacterial  strains  are  often 
found  to  be  mobile  elements  (e.g.,  prophage,  plasmids, 
integrated  elements),  or  variable  or  “flexible”  regions 
that  encode  genes  involved  in  cell  surface  structures 
(e.g.,  O-antigen,  capsular  polysaccharides,  teichoic  acid, 
S -layer,  flagella,  pili,  and  porins)  as  well  as  genes  for 
nutrient  utilization.  All  such  highly  variable  regions 
have  been  referred  to  as  flexible  genomic  islands  (fGIs) 
[44-50]. 

As  a  prerequisite  to  identifying  fGIs  in  the  pan-genome, 
a  consensus  core  backbone  and  fGI  assemblies  of  the  pan¬ 
genome  were  computed  using  gene_order.pl  (Additional 
file  8).  This  algorithm  uses  output  generated  by  PanOCT 
to  link  core  gene  clusters  (cGCs)  based  on  the  consensus 
of  the  layout  of  the  cGCs  in  individual  genomes  (Fig.  2a). 
The  cGCs  were  defined  as  containing  genes  from  75  %  or 
more  of  the  249  genomes,  resulting  in  a  consensus  core 
“pan-chromosome”  of  A.  baumannii  composed  of  3126 
genes  whose  coding  regions  totaled  2,988,228  bp.  When 
the  maximum  sizes  of  all  fGIs  were  inserted  into  the  core 
backbone,  the  maximum  size  of  the  pan-chromosome 
increased  to  5,070,600  bp,  which  is  1,047,552  bp  (-20  %) 
larger  than  the  average  genome  size  of  4,023,048  bp.  The 
constructed  pan-chromosome  had  a  circular  topology 
(rings  4  and  5  of  Fig.  2b),  indicating  that  cGCs  were  linked 
together  forming  a  circle  as  expected,  even  though  the 
majority  of  genome  assemblies  comprising  the  pan- 
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Fig.  1  Analysis  of  the  A.  baumannii  pan-genome.  The  distribution  of  protein  cluster  sizes  generated  from  the  comparison  of  249  A.  baumannii 
genomes  using  PanOCT  [35]  indicates  the  number  of  singleton  and  core  genes  (a).  The  pan-genome  size  ( left  panel )  and  the  number  of  novel 
genes  discovered  with  the  addition  of  each  new  genome  ( right  panel )  were  estimated  for  all  249  genomes  (b)  and  a  set  of  100  representative 
genomes  identified  by  hierarchical  clustering  of  all  249  genomes  (c)  using  a  pan-genome  model  based  on  the  original  Tettelin  et  al.  model 
[42],  Purple  circles  are  the  median  of  each  distribution  ( gray  circles).  Power  law  ( red  lines )  and  exponential  ( blue  lines)  regressions  were  plotted 
to  determine  a  (open/closed  status)  and  tg(0),  the  average  extrapolated  number  of  strain-specific/novel  genes,  respectively  [41] 


genome  are  in  draft  status  or  possibly  incomplete.  In 
addition  to  the  chromosome,  seven  additional  circular  “as¬ 
semblies”  were  determined  that  encode  between  2  and 
120  genes.  Five  of  the  circular  assemblies  were  identified 
as  sharing  homology  to  known  A.  baumannii  plasmids 
pABTJ2  [51],  pAB2  [52],  pRAY  [53],  and  p4ABAYE  [54]. 


Two  of  these  circular  assemblies  were  of  bacteriophage 
and  IS  element  origin. 

In  addition  to  generating  a  consensus  core  backbone, 
gene_order.pl  identified  the  location  of  flexible  genomic 
regions  (fGRs),  which  are  variable  regions  between  cGCs 
of  the  pan-chromosome  (Fig.  2a).  These  fGRs  are 
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Fig.  2  Pan-chromosome  and  fGIs.  The  core  gene  cluster  (cGC),  flexible  gene  cluster  (fGC)  and  flexible  genomic  region  (fGR)  locations  of  the  A 
boumonnii  pan-genome  (a)  were  computed  from  PanOCT  output  and  are  illustrated  as  a  circle  where  each  concentric  circle  is  numbered  from 
the  outermost  to  the  innermost  circle  (b).  fGR  locations  are  depicted  in  circles  1  (>20,000  bp),  2  (10,001-20,000  bp),  and  3  (1000-10,000  bp)  on  a  core 
backbone  of  genes  in  circles  4  (positive  strand)  and  5  (negative  strand).  Refer  to  the  key  for  details  on  color  representations,  circle  number  and  bar 
height.  Key  fGRs  are  noted  by  black  numbering  and  letters  K  (K-antigen),  N  (Novel),  and  O  (O-antigen)  as  in  Table  2.  Gray  numbers  indicate  positions 
on  the  pan-chromosome  in  megabase  pairs.  Predicted  functions  are  noted  in  gray  words  and  the  following  abbreviations:  Rl  resistance  island, 
Gl  genomic  island,  0  phage.  Genes  associated  with  Rl  insertions  are  labeled  in  gray  and  indicated  by  a  black  line.  The  frequency  of  individual  fGIs  within 
six  size  class  bins  was  also  determined  (c) 


composed  of  a  collection  of  fGIs  (Additional  file  9).  A 
given  fGI  is  an  instance  of  genomic  sequence  variation  ob¬ 
served  at  the  fGR.  Each  fGI  in  turn  is  made  up  of  individ¬ 
ual  linear  assemblies  of  flexible  genomic  clusters  (Fig.  2a). 
In  order  to  avoid  and  filter  out  spurious  fGIs  due  to 
random  IS  elements  or  bad  gene  calls,  we  required 
any  fGIs  carrying  less  than  three  genes  in  length  to 


be  present  in  at  least  10  %  of  the  genomes  analyzed. 
To  be  included  within  an  fGR,  we  required  fGIs  of 
three  or  more  genes  in  length  to  be  present  in  at 
least  three  genomes.  The  fGRs  are  illustrated  on  the 
outer  rings  1-3  of  Fig.  2b.  The  majority  of  fGIs  con¬ 
tained  between  one  and  ten  genes  (Fig.  2c),  which 
are  composed  of  IS  elements,  gene  duplications,  the 
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O-antigen  biosynthesis  cluster  (labeled  “O”  in  Fig.  2b) 
and  other  small  variable  biosynthetic  gene  clusters. 
There  were  89  fGIs  encoding  11-20  genes  (ring  2,  Fig.  2b) 
and  41  fGIs  encoding  21+  genes  (ring  1,  Fig.  2b).  The 
largest  fGI  encoded  97  genes,  was  79,689  bp  in  length, 
similar  to  phage  3  in  ACICU  [37],  and  highly  prevalent 
(present  in  151  genomes). 

fGIs  in  the  largest  fGRs 

The  largest  fGI  assemblies  within  the  20+  kb  fGR  size 
class  were  analyzed  for  functionality,  their  potential  role  in 
virulence,  survival,  drug  resistance,  and  evidence  of  lateral 
transfer.  While  many  fGRs  were  targets  for  insertion  of 
fGIs  that  encode  bacteriophage  components  (fGRs  4-7, 
11,  13,  14,  16,  20,  21,  23,  25,  26),  we  identified  metabolic 
pathways,  drug  resistance  genes,  and  potential  virulence 
factors  as  well  as  unusual  duplications  of  typical  core 
genes  that  were  inserted  within  the  largest  fGRs.  Some  of 
these  fGRs  contained  fGIs  that  were  reported  previously, 
such  as  the  putative  “alien  islands”,  a.k.a.  “pAs”,  reported 
in  the  MDR  A.  baumannii  strain  ACICU  [38]  (Table  2). 

Virulence  genes  in  fGRs 

In  A.  baumannii ,  the  outer  membrane  protein  OmpA  is 
associated  with  biofilm  formation  [55],  resistance  to  an¬ 
tibiotics  [56]  and  increased  cytotoxicity  of  outer  mem¬ 
brane  vesicles  in  cell  cultures  [57],  where  a  number  of 
OmpA  and  OmpA-like  proteins  were  present  in  outer 
membrane  vesicle  preparations.  Although  several  OmpA 
domain  proteins  were  found  in  the  core  pan-genome,  we 
also  located  SmpA/OmlA  family  proteins  and  multiple 
OmpA  domains  in  some  fGIs  within  fGR  9  (Fig.  2b, 
Table  2).  In  addition  to  OmpA-like  adhesins,  we  identified 
a  YadA-like  domain  protein  in  fGR  9,  which  in  Yersinia  is 
known  to  be  a  major  virulence  factor  functioning  in  adhe¬ 
sion  and  complement  evasion  [58]. 

fGIs  were  also  identified  that  encode  proteins  with 
putative  roles  in  iron  regulation.  For  example,  a  homolog 
of  the  ferric  uptake  regulator  protein  (Fur),  which  is  re¬ 
quired  for  iron  homoeostasis  and  defense  against  react¬ 
ive  oxygen  species  [59],  was  identified  in  an  fGI  within 
fGR  27  (Fig.  2b,  Table  2).  There  is  an  additional  copy  of 
fur  found  in  the  core  pan-genome,  which  was  previously 
identified  as  conserved  between  Acinetobacter  baylyi  and 
A.  baumannii  strains  [54].  Additionally,  putative  TonB- 
dependent  transporters/receptors  were  identified  in  fGR 
15  (Fig.  2b,  Table  2).  TonB-dependent  transporters/recep¬ 
tors  are  outer  membrane  proteins  that  bind  and  transport 
nutrients  for  energy  metabolism,  iron-chelating  sidero- 
phores,  and  other  metal-containing  complexes  [60],  have 
been  previously  shown  to  be  involved  in  bacterial  viru¬ 
lence  in  some  A.  baumannii  strains  and  were  horizontally 
transferred  [61],  as  is  consistent  with  being  within  a  fGI. 


At  least  four  other  TonB-like  transporter  genes  were  iden¬ 
tified  within  smaller  fGIs. 

Metabolic  pathways  within  fGIs 

Three  fGRs  (10,  18,  and  24;  Fig.  2b,  Table  2)  were  identi¬ 
fied  whose  predicted  protein  functions  fell  into  central 
metabolism  and  biosynthetic  pathway  role  categories.  A 
number  of  enzymes  of  the  aldehyde  dehydrogenase  family, 
such  as  vanillin  dehydrogenase,  acyl-CoA  dehydrogen¬ 
ase,  and  succinate-semialdehyde  dehydrogenase,  were 
identified  in  fGIs.  In  bacteria,  the  action  of  alcohol 
dehydrogenase  and  aldehyde  dehydrogenase  on  alcohol 
produces  organic  acids  like  acetic  acid  and  eventually 
acetyl-CoA.  The  acetyl-CoA  produced  enters  fatty  acid 
metabolism  and  the  tricarboxylic  acid  cycle.  It  has 
already  been  reported  that  low  concentrations  of  etha¬ 
nol  can  stimulate  growth  of  A.  baumannii  and  also  in¬ 
crease  its  pathogenicity  towards  some  organisms  [62]. 

Additionally,  enzymes  for  the  breakdown  of  aromatic 
compounds  indicate  metabolic  versatility  in  A.  baumannii 
to  possibly  enable  survival  on  alternative  carbon,  sulfur, 
and  nitrogen  sources  [63].  For  instance,  homoprotoca- 
techuate/hydroxyphenylacetate  degradation  (fGR  18)  and 
phenylpropanoid  degradation  (fGR  9)  pathways  can 
provide  intermediates  for  the  tricarboxylic  acid  cycle.  A 
phenylpropanoid/aromatic  degradation  pathway  (fGR  9) 
was  also  previously  mentioned  as  conserved  catabolic 
regions  (pca-qui  genes)  in  A.  baumannii  strain  AYE  and 
A.  baylyi  strain  ADP1  [54]. 

House-keeping  genes  in  fGIs 

We  also  observed  two  house-keeping  genes  in  fGIs 
(tRNA  ligase  genes  and  tat  ABC  system).  tRNA  ligases 
(a.k.a.  aminoacyl  tRNA  synthetases  or  “aaRSs”)  are  typic¬ 
ally  single  copy  essential  genes  with  rare  instances  of 
duplications  seen  in  few  bacteria,  Escherichia  coli  [64] 
and  Bacillus  subtilis  [65,  66]  being  two  such  examples. 
We  found  that  11  of  the  249  sequenced  A.  baumannii 
genomes  contain  one  or  more  tRNA  synthetase  duplica¬ 
tions  {tyrS,  cysS,  thrS;  fGRs  18,  27),  with  three  genomes 
carrying  cysS  and  thrS  duplications,  and  one  genome 
with  all  three  duplications.  Twin-arginine  translocation 
(Tat)  system  protein  translocases  TatA,  TatB,  and  TatC 
[67]  were  observed  in  eight  of  the  sequenced  genomes 
(fGR  15),  but  we  were  unable  to  identify  an  effector 
protein  with  a  Tat  secretion  signal  that  may  have  co¬ 
transferred  with  the  tatABC  operon. 

RIs  in  fGRs 

Because  RIs  are  composed  of  IS  elements,  composite 
transposons,  and  integrons,  which  are  by  definition  mo¬ 
bile  and  therefore  “flexible”,  we  predicted  that  our  algo¬ 
rithm  would  identify  them  as  fGIs,  but  it  was  unclear 
where  they  would  insert  into  the  core  pan-chromosome. 
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There  are  four  known  hot  spots  for  insertion,  including 
comM  [26,  68-70],  pho  [37,  71],  astA  [69],  and  an  acetyl- 
transferase  (acetylT)  gene  (a.k.a.  HPA2  in  [40]).  Three 
fGRs  (1,  10,  and  22;  Table  2)  were  discovered,  corre¬ 
sponding  to  the  known  locations  within  or  adjacent  to 
comM,  astA ,  and  acetylT,  respectively;  however,  we  did 
not  observe  an  fGR/fGI  near  pho.  Drug  resistance  (DR) 
genes  in  Rl-associated  fGRs  were  only  observed  at  the 
comM  (fGR  1)  locus,  which  comprised  13  of  the  38  fGIs 
and  31  drug  resistance  genes  (Additional  file  10).  In 
addition  to  the  known  RIs,  a  putative  novel  RI  was  dis¬ 
covered  in  an  fGI  26,097  bp  in  length,  encoding  a 
metallo-beta-lactamase  (ACIN5143_A3078  and  ACIN- 
NAV18_0027)  and  located  within  fGR  27  (Table  2,  Fig.  2b), 
residing  in  two  military  isolates  sequenced  in  this  study 
(OIFC143  and  Naval- 18). 

Identification  of  RI  signatures 

As  RIs  are  made  up  of  one  or  more  transposable  ele¬ 
ments  and  most  of  the  sequenced  A.  haumannii  ge¬ 
nomes  are  not  finished  and  are,  therefore,  represented  as 
multiple  genomic  contigs,  RIs  are  often  difficult  to 
characterize.  Even  with  the  use  of  the  novel  pan¬ 
chromosome  consensus-building  algorithm  described 
above,  RIs  appear  to  be  fragmented  and  represented  as 
multiple  fGIs.  Therefore,  to  better  identify  RI  insertion 
events  in  draft  genomes,  a  high-throughput  bioinformat¬ 
ics  approach  was  developed  and  implemented.  This  ap¬ 
proach  characterized  RI  signatures  rather  than  complete 
RI  structures.  RI  signatures  are  defined  as  both  the  gen¬ 
omic  location  and  the  type  of  RI  insertion  identified  in 
an  individual  isolate.  The  approach  searches  for  inser¬ 
tions  within  known  RI  insertion  hot  spots  comM ,  pho, 
astA,  and  acetylT,  and  identifies  homology  with  a  group 
of  carefully  selected  representative  RIs  to  minimize  re¬ 
dundancy  from  among  those  previously  reported  in  A. 
haumannii,  including  AbaR3  [37],  AbaR4  [70],  AbGRIl 
and  AbGRI2  [69],  and  Tn 1548  [72]  (Additional  file  11). 

Using  this  bioinformatics  approach,  a  total  of  173  out  of 
247  (70  %)  A.  haumannii  genomes  analyzed  were  scored 
as  Rl-positive  and  assigned  RI  signatures  (Additional  file 
12A-G).  Individual  clone  types  showed  insertion  site  pref¬ 
erences  and  carried  specific  RI  signatures  (Fig.  3a-c). 
While  RI  insertions  in  comM  were  common  among  mul¬ 
tiple  clone  types,  insertions  outside  of  comM  were  only 
detected  at  the  pho  locus  in  clonal  complex  1  (CC1)  iso¬ 
lates,  and  only  at  the  astA  or  acetylT  loci  in  CC2  isolates 
(Fig.  3a;  Additional  file  12b-d).  Two  distinct  types  of  RIs 
were  identified  at  the  comM  locus  of  Rl-positive  isolates: 
AbaR3  or  AbaR4  in  CC1  (22  out  of  26  isolates)  and  pre¬ 
dominantly  AbGRIl  in  CC2  (101  out  of  105  isolates) 
(Fig.  3a;  Additional  file  12a,  b).  At  non -comM  loci,  only  a 
single  type  of  RI  insertion  was  detected;  either  AbaR4  at 
pho  in  CC1  isolates,  or  in  CC2  isolates,  AbGRI2  at  astA  or 


Tn 1548  at  acetylT  (Fig.  3a;  Additional  file  12b-d).  Among 
the  group  of  123  CC1  and  CC2  isolates  identified  to  carry 
major  RI  signatures,  67  isolates  (54  %)  carried  more  than 
one  RI  insertion  in  the  genome  versus  56  (46  %)  carrying 
single  RI  insertions  (Additional  file  12b). 

To  determine  the  mechanism  of  RI  inheritance 
(i.e.,  vertical  or  horizontal)  and  to  understand  their 
evolution  in  individual  clonal  lineages,  a  whole  gen¬ 
ome  single  nucleotide  polymorphism  (SNP)  tree  was 
constructed  for  all  isolates  analyzed  (including  four  non- 
haumannii  outgroups)  (Additional  file  13).  The  SNP  tree 
was  defined  by  ~  150,000  variant  positions  located  on  the 
backbone  of  the  genomes  by  excluding  regions  with  un¬ 
usually  high  SNP  density  (Additional  file  14). 

Phylogenetic  relationships  of  the  isolates  as  shown  by 
the  SNP  tree  were  similar  to  those  in  the  BSR  tree  in 
that  A.  haumannii  isolates  were  grouped  by  MLST  type 
with  exceptions  for  certain  allelic  differences  within 
CC2.  Many  of  the  genomes  that  cluster  between  strains 
of  the  major  STs  were  off  by  one  allele  from  the  major 
ST,  making  them  a  member  of  a  CC  [73].  However, 
MRSN  4106,  3405  and  3942  (i.e.,  ST94)  differed  from 
ST  1  by  two  alleles,  suggesting  possible  horizontal  gene 
transfer  of  MLST  markers  in  these  strains.  It  is  clear 
from  both  the  BSR  tree  and  the  SNP  tree  that  the  military 
isolates  cover  a  spectrum  of  genome  diversity,  confirming 
the  observed  diversity  via  PFGE  (Additional  file  1). 

When  the  RI  signatures  were  superimposed  onto  the 
SNP  tree,  specific  patterns  of  RI  distribution  were  ob¬ 
served  across  different  sequence  types  (Additional  file  13). 
For  example,  the  distribution  of  AbGRIl,  which  is  pre¬ 
dominantly  found  in  CC2  isolates,  appeared  largely  to  be 
the  result  of  vertical  inheritance.  It  is  interesting  to  note 
that  an  entire  clade  does  not  carry  the  AbGRIl  RI  (tri¬ 
angle,  Additional  file  13).  In  contrast  AbaR3,  which  is 
mostly  found  in  CC1  strains,  showed  a  more  scattered 
pattern  of  inheritance  with  seemingly  equal  numbers 
with  and  without  this  RI.  However,  it  should  be  noted 
that  the  absence  of  a  detectable  RI  signature  in  this 
approach  could  be  due  to  the  incompleteness  of  the 
draft  genome  assemblies.  Additional  examples  of  appar¬ 
ent  clonal  or  vertical  inheritance  were  insertions  in  pho 
in  a  subgroup  of  CC1  isolates,  two  novel  insertions  dis¬ 
cussed  in  the  next  section,  including  a  7.8  kb  non-RI 
gene  insertion  in  acetylT  in  the  entire  group  of  ST  25 
isolates  and  a  composite  IS26  insertion  in  acyl-CoA 
synthase  (acylCS)  in  the  entire  group  of  CC3  isolates 
(Additional  file  13). 

Identification  of  novel  RIs  and  GIs 

During  the  analysis  of  RI  signatures,  we  identified  a  novel 
RI  insertion  detected  at  a  genomic  region  (ACICU  posi¬ 
tions  157,224-165,463  nucleotides)  flanked  by  acylCS 
(ACICU_00319)  and  a  predicted  transporter  protein 
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Fig.  3  Clone-type  specific  Rl  signatures  and  Rl  insertion  frequencies  over  time,  a-c  Genomic  locations  of  Rl  signatures  specific  to  clonal  complexes 
(CCs)  1-3  and  ST  25.  In  CC1  isolates,  Rl  insertions  are  detected  in  the  comM  and/or  pho  loci  (a).  In  CC2  isolates,  Rl  insertions  are  found  in  the  comM 
locus,  and  in  addition  the  astA,  or  acetylT  locus.  A  novel  Rl  insertion  (composite  IS26)  was  identified  in  CC3  isolates  at  the  acyl-CoA  synthase  (acylCS) 
locus  (b).  Gene  annotation  of  the  composite  IS26  Rl  is  shown  to  the  right.  Drug  resistance  genes  {green):  immediate  flanking  genes  acylCS  and  a 
transporter  protein  {dark  orange).  A  novel  Gl  was  identified  in  ST  25  isolates  at  the  acetylT  locus  (c).  Gene  annotation  of  the  7.8  kb  Gl  is  shown  to  the 
right.  Salicylate  monooxygenase  {blue);  immediate  flanking  genes  acetylT  and  mdtL  {dark  orange).  A  cumulative  frequency  graph  showing  the  number 
of  Rl-positive  isolates  carrying  single,  double,  or  triple  Rl  insertions  collected  between  1950  and  201 1  (d).  Isolates  that  represent  the  first  occurrence  of 
a  given  Rl  signature  are  labeled  on  the  graph  (numbered  1-9) 
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(ACICU_00143).  The  novel  RI  replaces  an  8  kb  genomic 
region  with  an  18  kb  RI  identical  to  a  previously  reported 
composite  IS26  transposon  carrying  a  class  I  integron 
(GenBank  accession  JX041889)  [74].  The  composite  RI 
carries  two  antibiotic  resistance  gene  cassettes,  including 
sulI-qacEdeltal-aadB-intll  (resistance  to  sulfonamides 
and  gentamycin)  and  tetR-tetA  (resistance  to  tetracycline). 
The  gene  structure  of  this  novel  RI  is  shown  in  Fig.  3b.  A 
fragment  of  the  RI  was  also  detected  within  fGR  “N” 
(Fig.  2b,  Table  2).  This  composite  IS26  RI  was  detected  ex¬ 
clusively  in  all  eight  of  the  CC3  isolates  analyzed,  includ¬ 
ing  six  sequenced  in  this  study  (Additional  file  12b,  e). 
Seven  of  these  isolates  were  MDR  strains  collected  from 
the  military  healthcare  system  between  2003  and  2009 
from  wound,  blood,  catheter,  or  unknown  sources.  The 
earliest  sequenced  CC3  isolate  was  collected  in  the 
Netherlands  in  1997  and  contained  only  the  5'  fragment 
of  the  composite  IS26  RI,  which  carried  only  one  resist¬ 
ance  gene  cassette,  sulI-qacEdeltal-aadB-intll ,  rather 
than  the  full  length  version  (Additional  file  15). 

In  addition  to  the  composite  IS26  RI,  we  also  identi¬ 
fied  a  novel  non-RI  7.8  kb  genomic  island  (GI)  juxta¬ 
posed  to  the  acetylT  locus  in  the  absence  of  the 
Tn 1548  RI  insertion  commonly  found  at  this  location 
(Fig.  3c;  Additional  file  16).  This  novel  non-RI  insertion 
was  also  detected  within  fGR  22  (Fig.  2b,  Table  2).  The 
7.8  kb  GI  shared  over  90  %  identity  at  the  nucleotide 
level  with  A.  calcoaceticus  PHEA-2  and  carried  six  an¬ 
notated  open  reading  frames  (ORFs),  including  genes 
encoding  a  fatty  acid  hydroxylase  and  a  salicylate 
monooxygenase.  Salicylate  monooxygenase,  normally  ab¬ 
sent  from  the  A.  baumannii  genome,  is  involved  in  the 
conversion  of  salicylate  to  catechol,  which  could  possibly 
be  used  as  a  building  block  for  the  construction  of 
catecholate-type  siderophores.  The  acetylT/7.8  kb  insertion 
was  detected  among  all  seven  isolates  of  the  non-major  se¬ 
quence  type  ST  25  (Additional  file  12b,  f).  Two  of  these 
isolates,  OIFC143  and  Naval-18,  were  sequenced  in  this 
study. 

Evolution  of  RI  insertion  site  usage  from  single  to 
multiple  RI  insertions 

To  provide  insight  into  how  RI  signatures  and  insertion 
site  usage  have  evolved  over  time,  insertion  site  usage 
was  plotted  by  cumulative  frequency  (Fig.  3d).  Three 
phases  of  site  usage  were  observed,  with  a  single  RI 
insertion  site  in  1951,  double  insertions  in  1999,  and 
more  recently,  triple  insertions  in  2011  (Fig.  3d). 
During  the  first  phase,  single  insertions  were  detected 
either  at  the  comM,  acetylT  (7.8  kb)  or  acylCS  loci. 
During  the  second  phase,  RI  insertions  were  detected  at 
comM  in  conjunction  with  a  second  insertion  at  pho , 
astA  or  acetylT.  Finally,  triple  insertions  were  observed 
at  comM ,  astA ,  and  acetylT  in  the  MDR-TJ  isolate. 


These  results  showed  a  rapid  increase  in  the  number  of  RI 
insertions  during  the  course  of  evolution  of  A.  baumannii 
for  antibiotic  resistance.  Analysis  of  additional  genome 
sequences  will  help  to  further  confirm  the  above 
observations. 


The  gain  and  loss  of  virulence  gene  content 

To  better  determine  the  presence  or  absence  of  specific 
gene  clusters  associated  with  virulence  and  survival,  we 
studied  the  distribution  and  conservation  of  known  viru¬ 
lence  genes  across  all  isolates.  We  detected  the  gain  and 
loss  of  gene  clusters  at  both  the  protein  and  nucleotide 
levels  based  on  centroid-to-ortholog  derived  BSR  analysis 
followed  by  whole  genome  sequence  alignments.  Among 
the  ten  classes  of  known  virulence/ survival  mechanisms 
analyzed,  including  a  collection  of  178  genes  (Additional 
file  17),  three  classes  of  genes  (type  I  pili,  siderophores, 
and  efflux  pumps)  showed  distinct  gain/loss  variations 
among  the  isolates.  A  heat  map  generated  based  on 
centroid-to-ortholog  derived  BSR  is  shown  in  Additional 
file  18  (BSR  values  in  Additional  file  19).  A  summary  of 
the  diversity  of  three  virulence  properties  (i.e.,  adhe¬ 
sion,  iron  acquisition,  and  efflux)  among  the  isolates 
analyzed  is  shown  in  Table  3  and  Additional  file  20. 

The  csuAB-E  gene  cluster  has  been  shown  to  encode  a 
chaperone-usher  type  I  pili  system  [75]  and  is  function¬ 
ally  characterized  [76,  77].  A.  baumannii  also  encodes 
two  additional  related  type  I  pili  clusters  [78].  The  pres¬ 
ence  of  the  csuAB-E  gene  cluster  has  been  shown  to  be 
variable  among  relatively  smaller  subsets  of  A.  bauman¬ 
nii  genomes  studied  [31,  40,  78].  We  observed  the  dele¬ 
tion  of  the  csu  gene  cluster  (i.e.,  type  I  pili  cluster  1) 
only  in  certain  ST  2  and  ST  10  isolates  as  42  and  17  kb 
deletions,  respectively  (Table  3,  Fig.  4).  Deletions  of  the 
csu  gene  cluster  in  ST  2  strains  have  been  previously 
reported  [40,  79],  but  the  17  kb  deletion  is  a  novel  dis¬ 
covery.  These  csu  deletions  appeared  to  be  the  result  of 
independent  molecular  events  based  on  observations 
that  the  deletions  occurred  in  different  lineages  as 
shown  on  the  SNP  tree  (Additional  file  13),  and  the  dis¬ 
tinct  sizes  of  the  deletions  (Fig.  5a).  Furthermore,  type  I 
pili  cluster  2  was  detected  across  all  isolates  except  two 
strains,  NIPH  60  and  SDF.  Type  I  pili  cluster  3  was 
present  among  CC1  and  CC3  isolates  but  absent  from 
all  ST  2,  ST  25,  ST  79,  ST  113,  and  ST  215  isolates 
(Additional  files  20  and  21;  total  =  113  isolates)  as 
shown  in  the  centroid-ortholog  BSR-derived  heat  map 
(Additional  file  18).  It  should  be  noted  that  by  taking 
into  account  the  overall  genomic  content  of  type  I  pili, 
strain  MDR-ZJ06  and  nine  UH  clade  B  isolates  encoded 
a  single  type  I  pilus  represented  by  cluster  2.  The  func¬ 
tional  significance  for  type  I  pili  expressed  from  different 
clusters  is  yet  to  be  determined.  Interestingly,  six  out  of 
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Table  3  Diversity  of  virulence  gene  content  across  A  baumannii  isolates 

Gene  clusters 


Isolates 

Genome  category 

Source 

Year 

ST 

Allele  summary 

Country 

1 

2 

3 

4 

Type  1  pili 

AYE 

Global 

Urinary 

2001 

1 

1-1 -M -5-1-1 

France 

+ 

+ 

+ 

ACICU 

Global 

Internal 

2005 

2 

2-2-2-2-2-2-2 

Italy 

+ 

+ 

- 

SDF 

Global 

Miscellaneous 

2000 

17 

3-29-30-1-9-1-4 

France 

- 

- 

- 

NIPH  335 

Global 

Respiratory 

1994 

10 

1 -3-2-1 -4-4-4 

Czech  Republic 

- 

+ 

+ 

OIFC098* 

WRAIR 

Miscellaneous 

2003 

10 

1 -3-2-1 -4-4-4 

Germany 

- 

+ 

+ 

MDR-ZJ06 

Global 

Blood 

2006 

2 

2-2-2-2-2-2-2 

China 

j 

+ 

- 

UH  clade  B2 

US  hospital 

Respiratory3 

2007 

2 

2-2-2-2-2-2-1 

USA 

j 

+ 

- 

NIPH  60 

Global 

Respiratory 

1992 

34 

8-1-14-3-12-1-13 

Czech  Republic 

+ 

- 

+ 

NIPH  528 

Global 

Unknown 

1982 

2 

2-1-2-2-2-2-2 

Netherlands 

+ 

+ 

_4 

MDR-TJ 

Global 

Miscellaneous 

Before  201 1 

2 

2-2-2-2-2-2-2 

China 

+ 

+ 

_4 

OIFC143* 

WRAIR 

Wound 

2003 

25 

3-3-2-4-7-2-4 

USA 

+ 

+ 

_4 

Siderophores 

AYE 

Global 

Urinary 

2001 

1 

1-1-1-1-5-1-1 

France 

+ 

- 

+ 

- 

ACICU 

Global 

Internal 

2005 

2 

2-2-2-2-2-2-2 

Italy 

+ 

- 

+ 

- 

SDF 

Global 

Miscellaneous 

2000 

17 

3-29-30-1-9-1-4 

France 

- 

- 

- 

- 

NIPH  190 

Global 

Unknown 

1993 

9 

3-1 -5-3-6- 1-3 

Czech  Republic 

- 

- 

+ 

- 

NIPH  41 05 

Global 

Blood 

1996 

39 

10-4-3-2-13-1-2 

Czech  Republic 

- 

- 

+ 

- 

OIFC0162* 

WRAIR 

Respiratory 

2003 

412 

1-52-2-2-67-4-5 

USA 

- 

- 

+ 

- 

OIFC047*5 

WRAIR 

Miscellaneous 

2003 

Novel 

1-75-2-2-67-1-2 

USA 

- 

- 

+ 

- 

NavaF82* 

WRAIR 

Blood 

2006 

410 

3-1-2-3-6-1-16 

USA 

- 

- 

+ 

- 

WC-348* 

WRAIR 

Skin 

2008 

412 

1-52-2-2-67-4-5 

Iraq 

- 

- 

+ 

- 

ATCC  1 7978 

Global 

Miscellaneous 

1951 

437 

3-2-2-2-30-4-28 

NA 

+ 

+6 

+ 

- 

6013113 

Global 

Skin 

2007 

81 

1-1 -1-1 -5-1 -2 

England 

+ 

+ 

+ 

- 

6013150 

Global 

Skin 

2007 

81 

M -1-1 -5-1 -2 

England 

+ 

+ 

+ 

- 

MRSN  3527 * 

MRSN 

Wound 

2011 

81 

1-1 -1-1 -5-1 -2 

USA 

+ 

+ 

+ 

- 

MRSN  3405* 

MRSN 

Wound 

2011 

94 

1 -2-2-1 -5-1-1 

USA 

+ 

+ 

+ 

- 

MRSN  3942 * 

MRSN 

Wound 

2011 

94 

1 -2-2-1 -5-1-1 

USA 

+ 

+ 

+ 

- 

MRSN  4106* 

MRSN 

Wound 

2011 

94 

1 -2-2-1 -5-1-1 

USA 

+ 

+ 

+ 

- 

WC-487*7 

WRAIR 

Skin 

2008 

410 

20-26-26-14-26-16-23 

Iraq 

- 

- 

- 

+8 

Efflux  pumps 

AYE 

Global 

Urinary 

2001 

1 

1-1-1-1-5-1-1 

France 

+ 

+ 

+ 

ACICU 

Global 

Internal 

2005 

2 

2-2-2-2-2-2-2 

Italy 

+ 

+ 

+ 

SDF 

Global 

Miscellaneous 

2000 

17 

3-29-30-1-9-1-4 

France 

- 

+ 

+ 

NIPH  60 

Global 

Respiratory 

1992 

34 

8-1-14-3-12-1-13 

Czech  Republic 

- 

+ 

+ 

NIPH  80 

Global 

Blood 

1993 

37 

3-2-2-2-7-1-2 

Czech  Republic 

- 

+ 

+ 

NIPH  615 

Global 

Respiratory 

1994 

12 

3-5-7-1 -7-2-6 

Czech  Republic 

- 

+ 

+ 

NIPH  41 05 

Global 

Blood 

1996 

39 

10-4-3-2-13-1-2 

Czech  Republic 

- 

+ 

+ 

OIFC047*5 

WRAIR 

Miscellaneous 

2003 

Novel 

1-75-2-2-67-1-2 

USA 

- 

+ 

+ 

OIFC111* 

WRAIR 

Miscellaneous 

2003 

49 

3-3-6-2-3-1  -5 

USA 

- 

+ 

+ 

AB900 

WRAIR 

Skin 

2003 

49 

3-3-6-2-3-1  -5 

USA 

- 

+ 

+ 

AB_TG27343 

Global 

Wound 

2005 

422 

26-72-2-2-29-4-5 

USA 

- 

+ 

+ 
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Table  3  Diversity  of  virulence  gene  content  across  A  baumannii  isolates  (Continued) 


Isolates 

Genome  category 

Source 

Year 

ST 

Allele  summary 

Country 

Gene  clusters 

12  3  4 

AB_1 536-8 

Global 

Unknown 

2006 

413 

1-3-2-2-5-8-12 

USA 

+ 

+ 

AB_1 583-8 

Global 

Unknown 

2006 

422 

26-72-2-2-29-4-5 

USA 

+ 

+ 

Naval-72* 

WRAIR 

Wound 

2006 

405 

5-3-16-4-29-1-60 

USA 

+ 

+ 

ZWS1122 

Global 

Blood 

2011 

2 

2-2-2-2-2-2-2 

China 

+ 

+ 

ZWS1219 

Global 

Blood 

2011 

2 

2-2-2-2-2-2-2 

China 

+ 

+ 

BJAB0715 

Global 

Miscellaneous 

5/2007-4/2008 

23 

1-3-10-1-4-4-4 

China 

+ 

+ 

Isolate  name:  finished  genomes  (bold),  pre-2000  isolates  (underline),  sequenced  in  this  study  (italics  with  asterisk) 

Specific  gain  or  loss  of  gene  clusters  with  respect  to  majority  of  isolates  analyzed  and  reference  genomes  AYE,  ACICU,  SDF  shown  as  "+"  and  signs,  respectively 

(1)  A  42  kb  deletion  was  detected  in  ST2  strain  MDR-ZJ06  and  the  UH  clade  B  isolates,  in  contrast  to  the  1 7  kb  deletion  observed  in  ST10  strains  NIPH  335  and  OIFC098 

(2)  Nine  UH  clade  B  isolates  carry  a  deletion  of  the  type  I  pili  csu  gene  cluster  [40] 

(3)  Six  out  of  nine  UH  clade  B  isolates  which  showed  a  loss  of  the  type  I  pili  csu  gene  cluster  are  of  respiratory  origin 

(4)  Loss  of  type  I  pili  cluster  3  was  detected  in  133  isolates,  including  ST2,  ST25,  ST79,  ST1 13,  and  ST215.  Only  three  isolates  are  shown.  Similar  gene  loss  was  not 
detected  in  ST1  or  ST3  isolates 

(5)  Two  isolates  (NIPH  410,  OIFC047)  had  a  dual  loss  of  siderophore  cluster  1  and  efflux  cluster  1  (AdeABC) 

(6)  Insertion  of  siderophore  cluster  2  was  detected  at  3  Mb  in  ATCC  17978,  which  differed  from  the  location  identified  in  ST81  and  ST94  isolates  at  3.8  Mb 
(coordinates  based  on  ACICU  genome) 

(7)  WC-487  is  a  non-baumannii  Acinetobacter  sp.  isolate 

(8)  Siderophore  cluster  4  was  also  detected  in  A.  baumannii  8399  [78,  80] 

ATCC  American  Type  Culture  Collection,  NA  not  available,  WRAIR  Walter  Reed  Army  Institute  of  Research 


nine  UH  clade  B  isolates  originated  from  respiratory  sam¬ 
ples,  and  all  have  been  reported  to  be  MDR  (Table  3). 

Siderophores  are  iron  uptake  machinery  for  bacterial 
survival  and  virulence  under  limiting  iron  conditions 
and  are  encoded  in  five  known  clusters/genomic  islands  in 
A.  baumannii  [78,  80].  We  observed  that  ST  1  (e.g.,  AYE), 
ST  2  (e.g.,  ACICU)  and  most  isolates  analyzed  in  general 
carried  siderophore  cluster  1  (A1S_1647  to  A1S_1657)  and 
cluster  3  (A1S_2372  to  A1S_2392)  (Table  3;  Additional  file 
18),  which  were  also  part  of  the  core  pan-genome.  How¬ 
ever,  siderophore  cluster  1  was  missing  in  four  US  military 
Walter  Reed  Army  Institute  of  Research  (WRAIR)  isolates 
sequenced  in  this  study  (strains  OIFC0162,  OIFC047, 
Naval-82,  and  WC-348)  and  two  additional  isolates  (NIPH 
190  and  NIPH  410)  (Table  3;  Additional  file  22).  Despite 
belonging  to  different  sequence  types,  all  six  isolates  shared 
close  phylogenetic  distances  as  shown  on  the  SNP  tree 
(Additional  file  13).  We  showed  that  siderophore  cluster  3, 
encoding  the  key  A.  baumannii  siderophore  acinetobactin, 
was  detected  among  all  isolates  analyzed  except  SDF  (from 
body  louse)  and  the  non-A  baumannii  isolate  WC-487. 

In  addition,  we  also  observed  the  acquisition  of  a  sid¬ 
erophore  gene  cluster  among  specific  isolates.  Sidero¬ 
phore  cluster  2  was  rarely  found  in  A.  baumannii  and 
only  previously  reported  in  two  isolates:  ATCC  17978 
(Figs.  4  and  5b)  collected  in  1951  and  A.  baylyi  ADP1 
[78].  Siderophore  cluster  2  was  also  found  on  an  fGI 
(Assembly_fGI  41,  Additional  file  8).  In  our  analysis,  we 
detected  cluster  2  in  six  additional  isolates  belonging  to 
ST  81  and  ST  94  collected  between  2007  and  2011 
(Table  3,  Figs.  4  and  5c).  The  six  isolates  shared  a  common 
insertion  site  for  siderophore  cluster  2  at  3.8  Mbp  different 
from  that  of  ATCC  17978  at  3.0  Mbp  (reference  genomic 


coordinates  were  based  on  the  ACICU  genome,  which 
does  not  carry  the  insertion).  The  six  isolates  are  also 
phylogenetically  distinct  from  ATCC  17978  as  shown  on 
the  SNP  tree  (Additional  file  13).  Among  the  six  isolates, 
four  were  isolated  from  wound  samples  of  the  US  military 
MRSN  collection  sequenced  in  this  study.  Further  studies 
are  needed  to  determine  if  siderophore  cluster  2  is  associ¬ 
ated  with  different  iron  availability  in  military  wound  sam¬ 
ples.  Lastly,  siderophore  cluster  4,  which  was  previously 
identified  in  A.  baumannii  isolate  8399  [78,  80],  was  only 
identified  in  one  isolate  in  this  study,  the  non-A.  bauman¬ 
nii  isolate  WC-487  (Table  3). 

Efflux  pumps  are  outer  membrane  proteins  that  drive 
the  expulsion  of  antimicrobials  leading  to  resistance 
against  aminoglycosides,  (1-lactams,  chloramphenicol, 
erythromycin  and  tetracycline  [81].  We  noted  that  the 
AdeABC  efflux  (A1S_1823  to  A1S_1825)  gene  cluster 
was  deleted  in  a  small  set  of  isolates  across  multiple 
strain  types  (Table  3).  Similar  to  SDF,  two  isolates, 
OIFC047  and  NIPH  410,  which  are  phylogenetically  closely 
related  as  shown  on  the  SNP  tree  (Additional  file  13), 
showed  a  dual  loss  of  the  AdeABC  efflux  cluster  and  the 
siderophore  cluster  1  (Table  3).  Determining  the  func¬ 
tional  consequence  of  the  gene  loss  will  aid  in  the 
characterization  of  the  significance  of  these  specific 
virulence  determinants. 

Discussion 

In  this  study,  the  draft  genome  sequences  of  50  A. 
baumannii  isolates  from  the  military  healthcare  system 
were  determined  and  analyzed  within  the  framework  of 
a  249  isolate  pan-genome,  to  identify  the  genetic  deter¬ 
minants  underlying  MDR  and  virulence  properties  in 
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Fig.  4  Virulence  and  fitness  factors  displaying  variable  gene  content  based  on  centroid-to-ortholog  BSR.  Specific  genomic  regions  involved  in  the 
assembly  of  type  I  pili,  siderophore  production  and  efflux  were  highly  variable  and  showed  specific  gain  or  loss  of  entire  gene  clusters  in  isolates 
analyzed.  In  the  BSR-based  heat  map,  the  presence,  absence,  and  low  similarity  of  a  protein  ortholog  compared  with  its  centroid  is  shown  in 
yellow  (BSR  =  1,  presence),  blue  (BSR  =  0,  absence),  and  gray  (low  similarity  or  truncated),  respectively.  The  year  of  collection  is  shown  above  strain 
names.  Isolates  collected  prior  to  year  2000  are  indicated  with  an  asterisk.  In  summary,  gene  gain/loss  events  involved  in  virulence  and  survival 
were  detected  in  decades-old  isolates  and  appeared  to  have  reemerged  among  recent  isolates.  The  list  of  virulence  genes  analyzed  and  the  full 
size  version  of  the  BSR-derived  heat  map  are  provided  in  Additional  files  17  and  19,  respectively 


the  context  of  strain  diversity  and  evolution.  Using  a 
novel  graph-based  approach,  we  identified  highly  vari¬ 
able  and  dynamic  genomic  content  of  the  A.  baumannii 
genome,  which  may  be  the  result  of  its  rapid  adaption 
and  survival  in  both  biotic  and  abiotic  environments 
through  the  gain  and  loss  of  gene  clusters  controlling 
fitness.  Importantly,  our  results  show  that  some  of  the 
adaptation  mechanisms  (e.g.,  gain/loss  of  pili  and  sid¬ 
erophore  gene  clusters)  existed  in  decades-old  isolates 
and  appeared  to  have  reemerged  among  recent  isolates. 
This  study  will  provide  a  valuable  framework  and  gen¬ 
etic  landmarks  for  surveillance,  prediction  of  outbreaks, 
and  understanding  the  epidemiology  of  globally  distrib¬ 
uted  isolates. 


A.  baumannii  pan-genome 

To  determine  whether  the  genomic  diversity  of  A.  bau¬ 
mannii  has  been  captured  among  all  sequenced  isolates 
(i.e.,  a  closed  pan-genome)  and  to  understand  how  the  50 
selected  military  isolates  were  evolutionarily  related  to 
previously  sequenced  isolates,  we  conducted,  to  our 
knowledge,  the  first  A.  baumannii  pan-genome  analysis 
on  the  most  expansive  set  of  isolates,  including  249  ge¬ 
nomes.  We  observed  1867  core  (100  %  membership), 
2833  core  (95  %  membership)  protein  clusters  and  a 
paralog-collapsed  pan-genome  cluster  size  of  11,694  pro¬ 
teins.  For  comparison,  in  a  pan-genome  study  of  186 
E.  coli  strains  (~1  Mbp  larger  than  A.  baumannii  and 
~1000  more  genes  per  genome),  there  were  1702  core 
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Fig.  5  (See  legend  on  next  page.) 
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0  \ 
(See  figure  on  previous  page.) 

Fig.  5  Loss  of  pili  cluster  1  ( csu  gene  cluster)  and  gain  of  siderophore  cluster  2  in  specific  isolates,  a  Two  types  of  deletion  were  observed  which 
led  to  a  complete  loss  of  the  type  I  pilus  csuAB-E  gene  cluster.  A  novel  17  kb  deletion  was  detected  in  NIPH  335  and  OIFC098,  whereas  a 
previously  reported  42  kb  deletion  was  found  in  MDR-ZJ06  and  nine  UH  clade  B  isolates  (e.g.,  UH6207).  b  Siderophore  cluster  2  was  detected  only 
in  a  small  subset  of  isolates  across  all  249  analyzed.  Two  apparently  independent  molecular  events  were  observed  among  the  siderophore  cluster 
2-positive  isolates.  In  decades-old  isolate  ATCC  17978,  insertion  of  the  gene  cluster  was  detected  at  a  genomic  position  corresponding  to  3.0 
Mbp  of  the  ACICU  reference  genome,  c  In  the  remaining  siderophore  cluster  2-positive  modern  isolates  (e.g.,  MRSN  3405),  insertion  was  detected 
at  a  different  location,  which  corresponds  to  3.8  Mbp  of  the  ACICU  reference  genome.  Since  ATCC  17978  was  isolated  in  1951  while  other  isolates 
were  isolated  more  recently  between  2007  and  2011,  the  acquiring  of  siderophore  cluster  2  among  modern  isolates  could  be  an  example  of  the 
reemergence  of  a  survival  mechanism  of  A  baumannii.  The  functional  significance  of  siderophore  cluster  2  is  yet  to  be  determined.  Key:  pairwise 
nucleotide  identity  shown  in  red  to  blue  (100  %  identity)  color  scale;  contig  breaks  {pink  vertical  bars)]  open  reading-frames  {thick  arrows );  type  I 
pilus  cluster  1  and  siderophore  cluster  2  genes  {green)]  deleted  genes  {gray  scale)]  genes  bordering  insertions/deletions  {dark  orange  and  brown)] 
other  flanking  genes  {orange)]  other  genes  {light  brown) 


(100  %),  3051  core  (95  %),  and  a  pan-genome  cluster  size 
of  16,373  proteins  [82].  This  shows  that  even  though  the 
average  genome  size  of  E.  coli  is  larger  by  ~1  Mbp,  the 
core  pan-genome  cluster  size  is  similar  to  A.  baumannii 
However,  the  larger  pan-genome  cluster  size  observed  in 
E.  coli  (by  -5000  proteins)  may  reflect  a  higher  proportion 
of  variable/flexible  regions  within  the  pan-genome  of 
E.  coli  compared  with  A.  baumannii. 

Pan-genome  open  or  closed? 

Our  initial  analysis  of  all  249  genomes  suggested  that 
the  pan-genome  was  closed;  however,  after  determining 
that  around  half  of  the  genomes  were  from  highly  re¬ 
lated  strains  of  MLST  CC2,  we  tested  whether  inclu¬ 
sion  of  highly  similar  strains  can  alter  the  pan-genome 
state  (e.g.,  open  versus  closed).  We  used  hierarchical 
clustering  to  normalize  the  diversity  of  strains  chosen 
for  inclusion  and  showed  that  the  pan-genome  was 
open  when  restricting  to  a  diverse  set  of  100  genomes. 
To  test  whether  this  was  the  result  of  undersampling 
rather  than  removal  of  highly  similar  genomes,  we  con¬ 
ducted  a  parallel  analysis  on  about  100  CC2  isolates 
and  showed  that  the  pan-genome  was  closed.  These  re¬ 
sults  suggest  that  the  inclusion  of  the  entire  set  of  249 
strains  in  the  pan-genome  state  calculation  can  bias  the 
outcome,  resulting  in  a  closed  pan-genome.  We  con¬ 
cluded  that  including  many  closely  related  strains  (i.e., 
from  an  outbreak)  in  a  pan-genome  study  could  bias 
the  results  of  the  pan-genome  state  (open  versus 
closed).  We  suggest  using  a  normalization  step  to 
choose  strains  for  inclusion  in  the  study  or  taking  a 
bootstrapping  approach  as  we  did:  first  run  all  genomes 
to  identify  ortholog/paralog  clusters,  build  a  BSR  tree, 
normalize  isolate  collection  for  diversity,  then  re-run 
the  analysis  a  second  time  using  the  final  strain  list. 
The  bootstrap  approach  may  also  be  useful  in  situa¬ 
tions  where  a  non-target  contaminant  strain  has  been 
sequenced  or  an  isolate  has  been  misidentified,  thus 
also  serving  as  a  quality  control  step. 


Assembly  of  core  proteins  and  fGIs  into  a  pan-chromosome 

To  facilitate  analysis  and  interpretation  of  this  large  pan¬ 
genome  dataset,  an  unsupervised  approach  was  devel¬ 
oped  and  implemented  through  a  novel  graph-based 
algorithm  to  assemble  ortholog  clusters  of  core  proteins 
(75  %  core  definition)  into  the  first  reference-independent 
consensus  core  “pan-chromosome”  of  a  bacterial  species. 
This  formed  the  foundation  for  the  identification  and 
placement  along  the  core  pan-chromosome  of  fGIs  that 
are  highly  flexible  and  variable  across  the  group  of 
isolates.  Both  circular  and  linear  “assemblies”  were  pro¬ 
duced,  where  the  core  pan-chromosome  clusters  as¬ 
sembled  into  a  circle  of  3126  genes,  which  is  roughly 
the  size  estimated  for  the  95  %  core  definition.  The  lin¬ 
ear  cluster  assemblies  were  the  fGIs  that  can  be  placed  on 
the  core  pan-chromosome,  making  up  fGRs,  while  the 
non-core  circular  cluster  assemblies  were  identified  as 
plasmids  and  circular  phage  genomes.  There  is  increased 
interest  in  these  extra-chromosomal  prophages  as  sources 
of  virulence  factors  [83]  and  as  vehicles  for  rapid  adapta¬ 
tion  to  changing  environments  in  a  “carrier  state”  [84]. 
The  circular  nature  of  these  extra-chromosomal  phage  ge¬ 
nomes  is  often  not  observed;  however,  our  novel  assembly 
algorithm  can  distinguish  both  circular  and  linear  forms 
of  prophages  as  circular  and  linear  fGIs,  respectively. 

Overall,  analysis  of  genes  encoded  on  fGIs  confirms 
previously  identified  catabolic  diversity  and  reiterates  the 
versatility  and  adaptation  of  A.  baumannii  to  survive  and 
thrive  in  a  variety  of  environments  where  nutrients  are 
scarce.  The  occurrence  of  alcohol  and  aldehyde  dehydro¬ 
genase  genes  in  fGIs  of  hospital- isolated  strains  could  be 
an  indication  of  the  ability  of  Acinetobacter  to  thrive  in 
the  presence  of  ethanol  disinfectant  reagents.  With  regard 
to  the  unexpected  finding  of  additional  copies  of  essential 
housekeeping  genes  such  as  the  tRNA  synthetases,  we  do 
not  know  the  function  of  these  additional  copies  or  the 
purpose  for  having  additional  copies.  Experiments  will 
have  to  be  conducted  to  determine  whether  these  fGIs 
carrying  the  aaRS  genes  can  complement  the  core  aaRS 
genes  and  under  what  conditions  they  may  be  expressed. 
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Virulence  factor  diversity  and  strain  fitness  for  host 
survival 

It  is  noteworthy  that  among  the  ten  classes  of  genes 
and  gene  clusters  previously  shown  to  be  associated 
with  virulence  and  fitness,  to  our  surprise,  a  high  gen¬ 
omic  diversity  was  observed  in  genes  involved  in  adhe¬ 
sion  (type  I  pili  assembly),  iron  acquisition  (siderophore 
production),  and  efflux  pumps  among  the  249  isolates 
studied. 

Functional  characterization  of  type  I  pili  cluster  1 
( csuAB-E ,  A1S_2213  to  A1S_2218)  has  shown  that  its 
expression  is  required  for  biofilm  formation  and  attach¬ 
ment  to  abiotic  surfaces  such  as  plastic  medical  devices 
(e.g.,  ventilator  tubes  and  catheters).  Interestingly,  a  csu- 
deficient  strain  showed  a  loss  of  long  appendages  while 
retaining  short  pili  on  the  cell  surface  and  also  enhanced 
attachment  to  an  increased  number  of  bronchial  epithe¬ 
lial  cells  [75].  As  previously  reported  [36],  we  also  ob¬ 
served  a  relatively  higher  incidence  of  csw-deleted  isolates 
in  respiratory  samples  among  the  UH  clade  B  isolates 
which  belong  to  CC2.  It  is  tempting  to  hypothesize  that 
the  loss  of  the  csu-e ncoded  pili  is  related  to  niche 
specialization  for  increased  invasiveness  or  enhanced 
survival  at  specific  sites  of  infection,  such  as  the  re¬ 
spiratory  epithelium. 

Siderophores  are  iron-scavenging  systems  utilized  by 
pathogens  to  survive  in  mammalian  host  environments. 
Besides  siderophore  gene  cluster  3  (which  encodes  the 
well  characterized  acinetobactin  system),  it  is  unclear 
what  types  of  siderophores  (e.g.,  catecholate,  phenolate, 
hyroxamate,  carboxylate  or  mixed)  are  produced  from 
the  other  three  siderophore  clusters  in  the  A.  baumannii 
genome.  It  is  conceivable  that  the  acquisition  of  cluster  2, 
specifically  among  the  four  US  military  MRSN  wound- 
isolates,  is  to  produce  a  novel  or  stealth  type  of  iron  scav¬ 
enger  to  circumvent  host  iron  defense  systems  (e.g., 
catechol-type  siderophore  inhibitor  siderocalin  [85])  or 
outcompete  other  bacteria. 

Also  potentially  pertinent  to  iron  scavenging,  a  novel 
7.8  kb  non-RI  GI  with  a  best  match  to  the  environmen¬ 
tal  isolate  A.  calcoaceticus  PHEA-2  was  identified  at  the 
acetylT  hot  spot  among  all  seven  ST  25  isolates  ana¬ 
lyzed.  One  of  the  ORFs  located  on  the  7.8  kb  GI  encodes 
salicylate  monooxygenase,  which  converts  salicylate  to 
catechol.  In  principle,  catechol  can  directly  serve  as  an 
iron  carrier  or  building  block  for  siderophore  synthesis. 
Functional  analysis  will  be  necessary  to  determine 
whether  the  specific  acquisition  of  this  salicylate  mono¬ 
oxygenase  can  increase  the  capacity  for  iron  acquisition, 
making  it  a  novel  mechanism  that  can  reinforce  and  di¬ 
versify  siderophore  production  in  this  pathogen. 

WC-487  is  one  of  the  strains  sequenced  in  this  study 
and  originally  thought  to  be  A.  baumannii.  Both  WC- 
487  and  SDF  showed  a  general  loss  or  absence  of  genes 


for  the  virulence  factors  analyzed.  The  lack  of  key  viru¬ 
lence  factors  in  the  human  louse  strain  SDF  supports  the 
idea  that  although  currently  classified  as  A.  baumannii , 
SDF  has  adapted  to  a  life  style  different  from  that  of  a  hu¬ 
man  pathogen.  For  WC-487,  multiple  lines  of  evidence, 
such  as  placement  on  both  the  BSR  and  SNP  trees  and  the 
absence  of  key  virulence  determinants,  suggest  that  WC- 
487  is  truly  not  A.  baumannii.  Indeed,  matrix  assisted 
laser  desorption  ionization  time-of-flight  (MALDI-TOF) 
mass  spectrometry  results  suggest  that  WC-487  instead 
belongs  to  Acinetobacter  nosocomialis  (X-Z  Huang,  manu¬ 
script  in  preparation). 

Dynamics  of  drug  resistance  genes  and  RIs 

Drug  resistance  genes  are  acquired  via  IS  elements  and 
small  composite  transposons.  The  association  with  IS  el¬ 
ements,  which  are  repetitive  and  classically  result  in  the 
misassembly  of  sequence  data,  are  also  problematic 
during  the  assembly  of  protein  clusters.  Using  the  fGI 
approach,  we  only  detected  drug  resistance  genes  associ¬ 
ated  with  an  RI  in  the  comM  hot  spot,  but  not  the  other 
three  hot  spots.  Even  with  this  limitation,  we  were  able 
to  identify  three  of  the  four  known  RI  insertional  hot 
spots  as  fGRs.  Our  algorithm  was  also  able  to  identify  a 
potentially  novel  RI,  encoding  a  putative  metallo-beta- 
lactamase  in  two  of  our  sequenced  military  isolates.  We 
identified  a  ~38  kb  fGI  within  the  astA  region  that  is 
similar  in  size  to  the  ~40  kb  deletion  that  is  known  to 
have  occurred  in  some  strains  [40],  which  highlights  the 
point  that  fGIs  can  be  insertions  or  deletions. 

Interestingly,  analysis  of  the  drug  resistance  profiles 
and  genome  sequences  of  the  military  isolates  revealed  a 
potentially  novel  parC  mutation  (Glu88Lys)  in  strain 
Naval-83,  which  could  be  associated  with  quinolone  re¬ 
sistance  in  A.  baumannii.  This  mutation  has  been  shown 
to  confer  resistance  to  a  third  generation  quinolone 
(levofloxacin)  in  Haemophilus  influenzae  [36]  and  may, 
therefore,  by  analogy  also  do  so  in  A.  baumannii.  Inci¬ 
dentally,  we  also  observed  this  same  mutation  in  A.  bau - 
mannii  1656-2  [86];  however,  its  resistance  to  levofloxacin 
was  not  communicated,  stressing  the  need  to  publish  anti¬ 
biotic  drug  resistance  profiles  alongside  genomic  data. 

Vertical  and  horizontal  transmission  of  RIs 

Two  major  questions  of  RI  transmission  are  whether  they 
are  vertically  or  horizontally  acquired  and  in  how  many 
genomic  locations  they  can  reside.  Since  the  presence  of 
IS  elements  resulted  in  fragmented  genomic  and  pan¬ 
chromosome  assemblies,  we  developed  a  high-throughput 
three-step  bioinformatics  approach  to  define  the  type  and 
location  of  RIs  in  individual  isolates  to  answer  these  ques¬ 
tions.  The  approach  included  the  identification  of  gene 
fragments  at  insertion  hot  spots,  recruitment  of  genomic 
contigs  using  RI  references  and  confirmation  for  the 
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presence  of  antibiotic  resistance  gene  cassettes.  Based  on 
the  classification  of  isolates  by  RI  signatures  and  phylo¬ 
genetic  distance  defined  by  a  SNP  tree,  our  results  re¬ 
vealed  that  clonal  expansion  and  vertical  inheritance  of 
specific  RI  signatures  are  commonly  observed  (e.g.,  CC1, 
CC2,  CC3,  ST  25).  Additionally,  the  accumulation  of  RIs 
at  multiple  hot  spots  within  an  isolate  also  suggested  a 
combined  dual  mode  of  transmission  that  includes  both 
vertical  transmission  of  the  comM- RI  and  horizontal  ac¬ 
quirement  of  RIs  at  secondary  locations. 

Dynamics  of  RI  insertions  and  virulence/fitness 
determinants 

Since  the  A.  baumannii  isolates  analyzed  in  this  study 
were  collected  throughout  several  decades  between  1951 
and  2011,  we  had  an  opportunity  to  follow  the  evolution 
of  genomic  determinants  such  as  RIs  and  virulence  fac¬ 
tors  during  this  timeframe.  Comparing  single  to  multi-RI 
existence  in  an  individual  genome,  the  ratio  is  6:1  (n  =  7) 
in  pre-2000  isolates  versus  0.9:1  (n  =  125)  in  post-2000 
isolates.  Despite  the  limited  sample  size  within  the  pre- 
2000  group,  there  is  a  prevalence  of  multi-RI  insertions 
among  modern  isolates.  Specifically,  by  considering  the 
same  set  of  insertional  hot  spots  among  pre-2000  and 
post-2000  isolates,  the  post-2000  isolates  have  a  higher 
prevalence  of  these  sites  occupied,  which  likely  resulted 
from  selective  pressure  from  the  increased  use  of  antibi¬ 
otics  in  recent  years  and  possibly  higher  sampling  rates 
post-2000.  There  were  also  strains  with  no  RI  insertion 
detected.  Considering  the  draft  status  of  most  genomes 
analyzed,  more  isolates  need  to  be  finished,  particularly 
for  older  isolates  that  are  not  well  represented. 

It  is  interesting  to  note  that  the  earliest  gain  or  loss 
events  can  be  traced  back  to  isolates  collected  from  two 
or  more  decades  ago.  For  example,  the  lack  of  different 
type  I  pili  clusters  were  first  observed  in  strains  NIPH 
528  (ca.  1982),  NIPH  60  (ca.  1992),  and  NIPH  335  (ca. 
1994),  which  are  among  the  oldest  strains  in  this  dataset 
(Additional  file  20).  Similarly,  the  earliest  isolates  show¬ 
ing  the  presence  of  siderophore  cluster  2  and  the  ab¬ 
sence  of  siderophore  cluster  1  were  ATCC  17978  (ca. 
1951)  and  NIPH  190  (ca.  1993),  respectively.  These  re¬ 
sults  suggest  the  early  existence  of  genetic  determinants 
controlling  virulence  and  pathogenesis  in  decades-old 
isolates  and  their  recent  reemergence  amongst  modern 
isolates  as  shown  in  this  study. 

Conclusions 

We  conducted  the  largest  bacterial  pan-genome  analysis 
(249  genomes)  of  A.  baumannii  and  determined  that 
this  pan-genome  is  open  when  the  input  genomes  are 
normalized  for  diversity.  A  novel  graph-based  algorithm 
was  developed  and  implemented  to  assemble  ortholog 
clusters  of  core  proteins  into  the  first  reference- 


independent  ‘pan-chromosome”  of  a  bacterial  species, 
which  was  essential  for  mapping  fGIs  to  fGRs.  We  con¬ 
cluded  that  the  observed  PFGE  diversity  of  the  50  selected 
military  isolates  was  mostly  due  to  differences  in  fGI  con¬ 
tent  rather  than  chromosomal  rearrangements  as  no  rear¬ 
rangements  of  large  contigs  were  detected;  however,  our 
ability  to  detect  rearrangements  is  limited  due  to  the  frag¬ 
mented  nature  of  the  genome  assemblies. 

We  utilized  a  comparative  genomics  approach  to 
analyze  the  diversity  of  RIs  and  virulence  factors  of 
A.  baumannii .  We  demonstrated  the  existence  of  novel 
RIs  and  isolates  with  an  increased  number  of  RI  insertions 
over  time.  Clusters  of  genes  for  carbon  utilization,  sidero¬ 
phore  production,  and  pili  assembly  were  highly  variable, 
which  may  contribute  to  the  success  of  A.  baumannii  in 
surviving  and  adapting  to  different  and  changing  environ¬ 
ments.  A  vast  collection  of  genetic  determinants  and 
mechanisms  to  control  antibiotic  resistance  and  survival 
adaptations  existed  in  decades-old  isolates,  and  these 
genetic  mechanisms  appear  to  have  reemerged  among 
modern  isolates,  sometimes  in  different  genomic  loca¬ 
tions.  The  comprehensive  comparisons  of  the  highly  vari¬ 
able  and  flexible  genomic  features  in  the  context  of  whole 
genome  phylogeny  will  serve  as  genetic  landmarks  for 
surveillance  and  prediction  of  outbreaks,  understanding 
the  epidemiology  of  globally  distributed  isolates  and 
identifying  clonal  origins  of  nosocomial  infections  of 
A.  baumannii  across  healthcare  institutions. 

Materials  and  methods 

Ethical  statement 

Per  WRAIR  Policy  12-09,  the  use  of  bacterial  isolates 
without  associated  human  data  does  not  require  a  deter¬ 
mination  from  the  institutional  review  board  or  Human 
Subjects  Protection  Branch,  the  corresponding  regula¬ 
tory  office. 

Strain  isolation  and  verification 

All  50  strains  sequenced  in  this  study  were  isolated  at  US 
military  healthcare  facilities  [15,  87,  88]  and  identified  as 
A.  baumannii  by  standard  automated  biochemical  analysis 
as  described  previously  [8].  PFGE  and  16S  rRNA  typing 
was  also  used  to  further  validate  species-level  classification 
from  genomic  DNA  prepared  as  described  [20] . 

Antimicrobial  susceptibility  tests 

Antimicrobial  susceptibility  tests  were  performed  on  all 
isolates  at  the  Walter  Reed  Army  Medical  Center  clinical 
laboratory  using  the  commercially  available  BD  Phoenix 
NMIC/ID133  panel  (Becton,  Dickinson  and  Company, 
Franklin  Lakes,  NJ,  USA).  Susceptibility  was  determined 
according  to  Phoenix  criteria  and  CLSI  M-100-S-19, 
Vol.29,  No.3  2009.  For  MRSN  58,  antimicrobial 
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susceptibility  tests  were  performed  using  the  commer¬ 
cially  available  Siemens  MicroScan  panel. 

Genome  sequencing 

The  genomes  of  50  A.  baumannii  isolates  were  sequenced 
at  JCVI  by  Illumina  HiSeq  (2  x  100  bp),  or  a  combination 
of  Illumina  HiSeq  and  454  FLX  Titanium.  Additionally, 
MiSeq  (2  x  150  bp),  IonTorrent  PGM,  454  libraries,  and 
OpGen  optical  restriction  maps  generated  by  WRAIR 
were  available  to  aid  in  gap  closure  for  certain  MRSN 
strains.  Briefly,  paired-end  libraries  were  constructed  for 
each  sequencing  technology  from  randomly  nebulized 
genomic  DNA  in  the  300-800  bp  (Illumina)  and  2-3  kb 
(454)  size  ranges  following  manufacturer  recommenda¬ 
tions.  Sequence  reads  were  generated  with  a  target  average 
read  depth  of  -20-30  fold  (454)  and  -60  fold  (Illumina) 
coverage. 

Draft  genome  assembly 

Sequences  for  the  non-MRSN  isolates  were  assembled 
using  the  Celera  Assembler  version  6.1  [89].  Assembled 
contigs  undergoing  further  genome  finishing  (n  =  10)  and 
automated  gap  closure  (n  =  7)  were  ordered  based  on  align¬ 
ment  against  the  best-matching  complete  A.  baumannii 
reference  genome  using  NUCmer  [90].  Mapped  contigs 
were  never  broken  even  if  the  contig  matched  different  re¬ 
gions  of  the  reference  genome  —  the  longest  match  was 
used  for  placement.  Mapping  merely  entailed  ordering  and 
orienting  the  contigs  with  small  spacers  inserted  between 
the  contigs.  As  a  result,  all  core  gene  adjacency  information 
within  the  contigs  was  retained.  Ten  of  the  42  genomes 
underwent  manual  gap  closure  to  elevate  the  genome  status 
to  IHQD  (Table  1). 

For  the  seven  MRSN  isolates,  we  explored  several  as¬ 
sembly  strategies  to  integrate  the  JCVI  Illumina  HiSeq 
data  with  data  generated  through  various  sequencing  plat¬ 
forms  by  WRAIR.  We  decided  to  employ  a  pipeline  that 
combined  de  novo  assembly  followed  by  automated 
reference-guided  gap  closure  to  resolve  short  and  uncom¬ 
plicated  gaps  <3.5  kb  in  length.  JCVI  sequence  reads  were 
assembled  with  Velvet  version  1.0.19  [91]  and  optimized 
using  the  VelvetOptimiser  2.2.0  [92].  The  Velvet  assembly 
served  as  the  backbone  while  other  de  novo  assemblies  of 
the  WRAIR  libraries  built  with  Celera  assembler  version 
7.0  or  Velvet  version  1.0.19  served  as  references  from 
which  the  gap  sequences  would  be  predicted.  In  the  first 
round  of  gap  closure,  optical  maps  (OpGen)  were  used  to 
validate  the  assembly  as  well  as  to  order  and  orient  the 
backbone  contigs  using  SOMA  [93] .  Automated  gap  clos¬ 
ure  consisted  of  the  following  processes:  1)  to  determine 
gap  regions,  consecutive  contig  ends  were  identified  by 
alignment  against  the  consensus  sequence  generated  from 
various  de  novo  assemblies  using  NUCmer  [90];  2)  the 
identified  contig  ends  were  used  to  recruit  reads  from  the 


JCVI  Illumina  paired-end  library  using  Burrows- Wheeler 
Aligner  (BWA)  0.7.3  [94];  3)  the  recruited  reads  were 
assembled  using  the  CLC  command  line  tool  clc_mapper 
from  clc-assembly-cell  v.4.0.11  [95]  by  mapping  the  re¬ 
cruited  reads  from  step  2  to  the  gap  regions  from  step  1 
to  generate  a  new  consensus  sequence  for  each  of  the 
gaps;  4)  the  contigs,  and  if  available,  the  new  gap  se¬ 
quences  from  step  3,  were  stitched  together  to  resolve  the 
gaps;  5)  the  CLC  clc Jind-Variations  command  line  tool, 
also  from  clc-assembly-cell  v.4.0.11,  was  run  to  validate 
the  new  consensus  sequence  by  determining  the  existence 
of  any  Ox  coverage  regions.  If  any  Ox  regions  were  found, 
the  original  gap  remained.  BLAST  v.2.2.28  [96]  was  then 
used  to  select  the  closest  matching  complete  A.  bau¬ 
mannii  genome  in  GenBank  to  serve  as  the  reference 
for  scaffolding  the  resulting  contigs  from  the  first  round 
of  the  automated  gap  closure,  using  NUCmer  [90] 
alignments.  The  contigs  then  proceeded  through  a  sec¬ 
ond  round  of  the  automated  gap  closure  process. 

Annotation 

Contigs  were  annotated  for  protein-  and  RNA-encoding 
features  using  the  JCVI  automated  annotation  pipeline  es¬ 
sentially  as  described  previously  [44,  47,  97,  98]  except 
hidden  Markov  models  were  run  using  HMMER3  [99] . 

Identification  of  antibiotic  resistance  genes 

Genes  conferring  drug  resistance  were  identified  using 
the  RGI  (Resistance  Gene  Identifier,  version  2)  tool  in 
CARD  (Comprehensive  Antibiotic  Resistance  Database) 
[100].  For  each  genome  in  this  study  (Table  1)  a  multi- 
FASTA  composite  file  was  loaded  into  RGI  and  the  out¬ 
put  saved  for  further  parsing.  Results  were  filtered  by 
selecting  the  highest  percent  identity  match  for  each 
ORF.  Genes  that  were  regulators  or  modulators  were 
filtered  out.  Genes  identified  were  classified  by  their 
antibiotic  resistance  ontology  assigned  by  CARD;  ontol¬ 
ogies  are  based  on  resistance  mechanisms,  determi¬ 
nants  and  targets. 

Several  other  genes  were  identified  by  BLAST  analysis. 
A  database  of  additional  drug  resistance  genes  was  com¬ 
piled  from  the  GenBank  accessions  of  previously  curated 
lists  [17,  101].  Genomes  were  searched  against  this  data¬ 
base  using  BLASTP  and  unique  ORFs  not  already  identi¬ 
fied  by  RGI  were  examined.  Matches  with  >90  %  amino 
acid  identity  were  assigned  a  classification. 

MLST  analysis 

MLST  was  determined  using  an  in-house  automated 
pipeline  that  first  searches  for  homologs  of  each  gene  of 
the  typing  schema  ( cpn60:fusA:gltA:pyrG:recA:rplB:rpoB ) 
from  [73,  102],  using  BLASTN  [96].  MLST  homologs 
were  extracted  from  the  genome  sequence  and 
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compared  with  an  MLST  allele  database  to  generate  the 
allele  number  and  ST  for  each  genome. 

Pan-genome  analysis 

Clusters  of  orthologous  proteins  were  generated  (Additional 
file  5)  by  version  3.18  of  PanOCT  [35]  using  default  pa¬ 
rameters  (Additional  file  23).  In  order  to  plot  “power  law 
and  exponential  regressions  for  new  genes  discovered  with 
the  availability  of  additional  genome  sequences”,  as  de¬ 
fined  by  Tettelin  et  al.  [41],  we  adapted  the  R  scripts,  com- 
pute_pangenome.R  and  plot_pangenome.R ,  from  Park 
et  al.  [103]  and  developed  a  Perl  script,  paralog_matchta- 
ble.pl.  Since  PanOCT  does  not  place  paralogs  into  its 
ortholog  clusters,  but  does  produce  a  paralogs.txt  file  that 
specifies  which  clusters  are  paralogs,  an  in-house  PERL 
script,  paralogs_matchtable.pl ,  was  created  to  merge  par- 
alogous  clusters  (Additional  file  23).  This  is  necessary  be¬ 
cause  analysis  of  core  and  novel  genes  has  historically 
been  defined  for  clusters  containing  all  paralogs  [42,  103- 
107].  In  the  past,  core  and  novel  pan-genome  plots  were 
computed  from  all  possible  combinations  in  genome 
order,  but  this  is  computationally  prohibitive  when  the 
number  of  genomes  is  over  100.  To  overcome  this  limita¬ 
tion,  compute_pangenome.R  was  modified  to  randomly 
sample  without  replacement  a  subset  of  500  combinations 
in  genome  order  of  addition.  The  output  of  this  script  is  a 
set  of  data  where  each  row  contains  columns  for  core,  dis¬ 
pensable,  unique,  and  genes  novel  for  the  last  genome 
added.  The  plot_pangenome.R  script  computes  the  me¬ 
dians  of  the  compute_pangeome.R  output  and  uses  the 
nonlinear  least  squares,  nls,  function  in  R  to  find  power 
law  and  exponential  models  to  fit  the  medians. 

Consensus  assemblies  of  the  core  and  the  flexible  parts 
of  the  A.  baumannii  pan-genome  were  calculated  using 
outputs  from  PanOCT.  The  consensus  core  pan¬ 
chromosome  was  computed  by  running  an  in-house 
PERL  script,  gene_order.pl,  using  the  PanOCT  75_cor- 
e_adjacency_vector.txt ,  O_core_adjacency_vector.txt ,  and 
the  centroids.fasta  output  files  as  input  (Additional  file 
23).  The  7S_core_adjacency_vector.txt  file  lists  the  set  of 
adjacent  core  gene  clusters  (called  “adjacencies”)  and 
specifies  which  genomes  contain  them.  Core  gene  clus¬ 
ters  are  defined  as  gene  clusters  conserved  in  at  least 
some  threshold  number  of  genomes  (e.g.,  75  %).  A  core 
gene  cluster  is  adjacent  to  another  core  gene  cluster  in  a 
given  genome  if  the  representative  cluster  members  for 
that  genome  are  adjacent  (i.e.,  they  are  on  the  same  con- 
tig  and  have  no  other  core  genes  between  them).  The 
consensus  assembly  of  the  core  gene  clusters  is  the  set 
of  adjacencies  supported  by  the  largest  number  of  ge¬ 
nomes  (Additional  file  8). 

Conceptually,  the  order  and  orientation  of  these  clus¬ 
ters  can  be  depicted  as  linear  or  circular  arrangements, 
analogous  to  sequence  assembly.  The  linear  paths  can 


result  from  contig  breaks,  linear  chromosomes  or  plas¬ 
mids,  or  because  there  is  a  disagreement  in  the  juxtapos¬ 
ition  of  neighboring  core  clusters  between  two  or  more 
genomes.  The  circular  paths  can  represent  circular  chro¬ 
mosomes,  plasmids  or  occasionally  small  elements  that 
are  inverted  in  different  genomes. 

fGIs  were  defined  in  [50]  as  GIs  encoding  similar  types 
of  functions  (e.g.,  O -antigen,  phage,  pili),  having  the  same 
genomic  location,  but  a  variable  gene  content.  We  define 
fGIs  more  loosely  to  be  variable  (i.e.,  “flexible”)  linear  as¬ 
semblies  of  noncore  genes  present  between  core  gene 
clusters.  These  assemblies  were  constructed  and  the  fGIs 
identified  using  the  same  gene_order.pl  script;  however,  the 
PanOCT  output  file  O_core_adjacency_vector.txt  is  used  (0 
%  threshold)  as  input  so  that  all  gene  clusters  are  consid¬ 
ered,  not  just  core  gene  clusters  (Additional  file  9).  The 
fGIs  are  not  allowed  to  extend  into  core  gene  clusters 
already  in  the  core  pan-genome;  rather,  they  are  terminated 
at  a  core  gene  cluster  and  the  core  gene  cluster  is  labeled 
as  an  fGI  insertion  site. 

Pan-genome  tree 

A  UPGMA  (unweighted  pair  group  method  with  arith¬ 
metic  mean)  tree  was  constructed  using  the  mean  of  the 
BSR  as  described  previously  [108].  The  PanOCT  output 
file  100_pairwise_BSR_distance_matrix_phylip.txt  was 
used  as  input  for  Neighbor  [109,  110]  to  build  an 
unrooted  tree.  This  PanOCT  output  file  is  a  Phylip-style 
distance  matrix  derived  from  the  pairwise  mean  BSR  of 
core  proteins  present  in  100  %  of  genomes  where  a  single 
value  is  presented  for  each  pair  of  genomes  in  the  pan¬ 
genome. 

SNP  tree 

A  phylogenetic  tree  was  inferred  from  SNPs  identified 
among  253  Acinetobacter  genomes.  SNPs  were  identified 
by  kSNP  [111]  with  a  requirement  that  at  least  80  %  of 
the  genomes  (i.e.,  203  genomes)  have  a  nucleotide  at  a 
given  SNP  position  in  order  for  the  SNP  to  be  consid¬ 
ered  for  inclusion  in  downstream  analysis.  A  total  of 
207,619  identified  SNP  positions  were  further  filtered  to 
remove  SNPs  in  regions  likely  undergoing  recombin¬ 
ation  by  detecting  regions  with  unusually  highly  SNP 
density.  For  this  filtering  step,  a  set  of  pairwise  SNPs 
was  identified  between  the  finished  genome  of  A.  bau¬ 
mannii  ACICU  and  related  ST  2  genomes  using  the 
SNP  export  functionality  within  progressiveMauve  [112]. 
The  pairwise  SNP  density  was  computed  based  on 
ACICU  positions  shared  among  a  subset  of  genomes 
with  the  fewest  total  number  of  pairwise  SNPs.  Any 
regions  with  higher  than  10  SNPs/kb  for  any  strain  were 
considered  as  potential  recombination  regions.  After 
filtering  out  these  regions  there  were  152,995  presumed 
non-recombinant  SNPs.  These  SNPs  were  used  to  generate 
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a  maximum-likelihood  tree  using  RAxML  [113]  with  100  locations  of  virulence  genes  and  RI  insertions  studied  are 
bootstrap  replicates.  summarized  in  Additional  file  24. 


Identification  of  RI  signatures 

A  high-throughput  approach  was  developed  to  identify 
RI  signatures  for  249  isolates,  including  draft  genome  as¬ 
semblies.  In  draft  genomes,  RIs  are  often  poorly  assem¬ 
bled  due  to  repetitive  elements.  In  this  approach,  the 
RI  signatures  were  determined  based  on:  (i)  junction 
fragments  of  target  genes,  (ii)  sequence  similarities  to 
previously  reported  RIs  and  homology  to  antibiotic 
resistance  cassettes  within  individual  RIs.  First,  the 
presence  of  target  gene  junction  fragments  was  identified 
using  a  BLASTN  search  against  five  target  genes  known 
to  harbor  RIs  in  A.  baumannii  (Additional  file  11).  A  sum¬ 
mary  of  target  gene  lengths  when  intact  or  carrying  junc¬ 
tion  fragments  in  the  presence  of  RI  insertion  is  shown  in 
Additional  file  12h.  Second,  contigs  identified  to  carry  the 
target  gene  and  junction  fragments  were  searched  against 
a  collection  of  carefully  selected  RIs,  which  captures  the 
diversity  of  RIs  identified  to-date  (Additional  file  11),  using 
BLAST  and  NUCmer  [90].  RI  signature  assignments 
were  performed  based  on  BLAST  similarity  and  mum- 
merplot  alignments  to  the  set  of  reference  RIs.  An  ex¬ 
ample  RI  signature  assignment  for  comM  is  shown  in 
Additional  file  12i. 

Gain/loss  of  virulence  factor  gene  clusters 

For  every  gene  of  interest,  the  average  BSR  value  between 
the  centroid  protein  of  the  ortholog  cluster  and  the  ortho¬ 
log  protein  identified  in  each  isolate  were  obtained  from 
the  PanOCT  output  (e.g.,  pairwise_in_cluster.txt  and  cen- 
troids.fasta).  We  use  the  term  “centroid”,  which  is  technic¬ 
ally  the  medoid,  to  be  a  representative  protein  sequence  of 
a  cluster  whose  average  dissimilarity  to  all  the  sequences  in 
the  cluster  is  minimal  [114].  An  average  BSR  value  close  to 
1  indicates  identical  sequences,  a  value  near  0  indicates  ab¬ 
sence  of  the  protein  in  the  given  isolate,  and  a  value  around 
0.5  suggests  a  truncation  of  the  protein  with  respect  to  the 
centroid.  A  matrix  of  averaged  BSR  values  between 
centroid-ortholog  pairs  was  constructed  with  genes  as 
rows  and  isolates  as  columns  (Additional  file  19). 

To  identify  initial  evidence  of  gain  or  loss  of  gene  clus¬ 
ters  at  the  ortholog  protein  level,  the  centroid-to- 
ortholog  BSR  matrix  was  analyzed  using  hierarchical 
clustering  in  MeV  version  4.9.0  [115].  Additionally,  re¬ 
gions  of  gene  cluster  gain/loss  and  flanking  genomics  se¬ 
quences  were  then  examined  at  the  genomic  contig  level 
using  pairwise  sequence  alignments  by  MUMmer  ver¬ 
sion  3.0  [116].  Alignments  of  genome  contigs  comparing 
selected  reference  genomes  and  isolates  of  interest  to 
show  gain/loss  of  gene  clusters  were  plotted  using  Easy- 
Fig  [117].  The  set  of  A  baumannii  virulence  genes  ana¬ 
lyzed  is  shown  in  Additional  file  17.  The  genomic 


Software,  genome  sequences,  and  raw  data  availability 


The  pan-genome  analysis  software  developed  and  imple¬ 
mented  in  this  study,  including  panoct.pl  version  3.18, 
paralog_matchtable.pl ,  and  gene_order.pl  are  publicly 
available  in  the  code  section  of  the  PanOCT  SourceForge 
page  [118],  under  the  GNU  General  Public  License. 

GenBank  accession  numbers  for  the  50  genomes 
that  were  sequenced,  assembled  and  annotated  in  this 
study  are  listed  in  Table  1  and  Additional  file  2.  The 
whole  genome  sequencing  data  are  available  at  [119] 


under  the  following  accession  numbers:  AFCZ00000000, 


AFDA00000000, 

AFDL00000000, 

AFDO00000000, 

AMDE00000000, 

AMDR00000000, 

AMFH00000000, 

AMFL00000000, 

AMFT 00000000, 

AMFW00000000, 

AMFZ00000000, 

AMGF00000000, 

AMSW00000000, 

AMSZ00000000, 

AMZR00000000, 

JPHV00000000, 

JPHY00000000, 


AFDB00000000, 
AFDM00000000, 
ALAL00000000, 
AMDF00000000, 
AMEI00000000, 
AMFI00000000, 
AMFP00000000, 
AMFU00000000, 
AMFX00000000, 
AMGA00000000, 
AMGG00000000, 
AMSX00000000, 
AMTA00000000, 
AMZT 00000000, 
JPHW00000000, 
JPHZ00000000, 


AFDK00000000, 
AFDN00000000, 
ALII00000000, 
AMDQ00000000, 
AMEJ 00000000, 
AMFK00000000, 
AMFS00000000, 
AMFV00000000, 
AMFY00000000, 
AMGE00000000, 
AMGH00000000, 
AMSY00000000, 
AMTB00000000, 
AMZUO 1000000, 
JPHX00000000, 
JPIA00000000, 


JPIB00000000.  The  raw  sequence  reads  generated  at 


JCVI  are  available  at  [120]  under  the  following  acces¬ 


sion  numbers 

SRR1945427, 

SRX027591, 

SRX027596, 

SRX031275, 

SRX101601, 

SRX101606, 

SRX101796, 

SRX1 10040, 

SRX1 10095, 

SRX1 10099, 

SRX110103, 

SRX110107, 

SRX110112, 

SRX110118, 

SRX110122, 

SRX110126. 


i:  SRR1945422, 
SRR1945428, 
SRX027592, 
SRX027597, 
SRX032336, 
SRX101602, 
SRX101793, 
SRX101797, 
SRX1 10090, 
SRX1 10096, 
SRX1 10100, 
SRX1 10104, 
SRX1 10109, 
SRX110113, 
SRX110119, 
SRX1 10123, 


SRR1945425, 
SRR1945430, 
SRX027593, 
SRX031272, 
SRX101596, 
SRX101603, 
SRX101794, 
SRX101798, 
SRX110091, 
SRX1 10097, 
SRX110101, 
SRX110105, 
SRX110110, 
SRX110114, 
SRX110120, 
SRX110124, 


SRR1945426, 
SRR1946597, 
SRX027595, 
SRX031273, 
SRX101598, 
SRX101604, 
SRX101795, 
SRX101800, 
SRX1 10092, 
SRX110098, 
SRX110102, 
SRX110106, 
SRX110111, 
SRX110115, 
SRX110121, 
SRX110125, 


The  combined  JCVI-annotated  fasta-formatted  amino 
acid  sequences  and  combined  attribute  files  required 
for  panoct.pl  are  available  as  Additional  files  25  and  26, 
respectively.  An  essential  set  of  raw  output  files  from 


47 


Chan  et  al.  Genome  Biology  (2015)  16:143 


Page  24  of  28 


panoct.pl  and  gene_order.pl  is  also  available  in  Additional 
file  27. 


Additional  files 


Additional  file  1:  Figure  SI.  Pulse  field  gel  electrophoresis  (PFGE)  of  A 
baumannii  isolates  collected  from  the  Military  healthcare  system.  A 
dendrogram  was  produced  based  on  the  analysis  of  PFGE  banding 
patterns.  Genomic  DNA  was  digested  with  Apal,  separated  by  clamped 
homogenous  electric  fields  (CHIEF)  gel  electrophoresis  and  analyzed  with 
the  Dice  coefficient  as  described  previously  [15].  Isolates  with  greater 
than  or  equal  to  90  %  similarity  are  considered  to  be  the  same  strain. 
Strain  names  are  noted  at  the  right  and  their  corresponding  MLST 
sequence  types  in  parentheses. 

Additional  file  2:  Table  SI.  GenBank  identifiers  for  A.  baumannii 
genomes  sequenced  at  JCVI  in  this  study. 

Additional  file  3:  Table  S2.  Antibiotic  resistance  susceptibility  profiles 
and  predicted  resistance  mechanisms  for  A.  baumannii  genomes 
sequenced  in  this  study. 

Additional  file  4:  Table  S3.  All  A.  baumannii  isolates  used  in  this  study. 

Additional  file  5:  Text  file  of  all  PanOCT  clusters. 

Additional  file  6:  Figure  S2.  Phylogenetic  tree  of  the  A.  baumannii 
pan-genome.  A  dendrogram  was  constructed  based  on  the  mean  of  the 
pairwise  BLASTP  score  ratios  (BSRs)  of  core  protein  clusters  that  were 
present  in  100  %  of  all  249  A.  baumannii  isolates  constituting  the 
pan-genome.  The  BSR  tree  was  generated  from  the  Pa nOCT-de rived  BSR 
distance  matrix  using  the  Interactive  Tree  of  Life  (iTOL).  The  five  most 
abundant  MLST  sequence  types  with  available  genome  sequence  are 
illustrated  by  color  highlights  (see  inset  key).  The  50  isolates  sequenced 
in  this  study  are  noted  with  a  red  bar  on  the  outside  of  the  tree. 

Genomes  chosen  for  sub-sampling  of  the  pan-genome  by  hierarchical 
clustering  are  marked  with  a  gold  bar  on  the  outside  of  the  tree. 
Additional  file  7:  Figure  S3.  A.  baumannii  pan-genome  of  ST  2 
genomes.  The  pan-genome  size  ( left  column )  and  the  number  of  novel 
genes  discovered  with  the  addition  of  each  new  genome  ( right  column ) 
were  estimated  for  1 1 1  ST  2  genomes  using  a  pan-genome  model  based 
on  the  original  Tettelin  et  al.  model  [42],  Purple  circles  are  the  median  of 
each  distribution  {gray  circles).  Power  law  {red  lines )  and  exponential  {blue 
lines)  regressions  were  plotted  to  determine  a  (open/closed  status)  and 
tg(0);  the  average  extrapolated  number  of  strain-specific/novel  genes, 
respectively. 

Additional  file  8:  Text  file  of  cluster  assembly  composition. 
Additional  file  9:  Text  file  of  the  location  and  composition  of  fGRs. 

Additional  file  10:  Table  S4.  Chromosomally  encoded  antibiotic 
resistance  genes  found  within  fGIs  and  fGRs. 

Additional  file  11:  Table  S5.  Resistance  island  target  sites. 

Additional  file  12:  Table  S6.  a  Distribution  of  Rl-positive  isolates 
among  A.  baumannii  genomes  analyzed,  b  Summary  of  major  Rl 
signatures  identified  among  CC1,  CC2,  CC3,  and  ST  25  isolates.  All  Rl 
signatures  identified  in  (c)  CC1,  (d)  CC2,  (e)  CC3,  (f)  ST25,  and  (g)  other 
isolates,  h  A  list  of  Rl  target  genes  showing  total  gene  length 
detected  when  intact  and  having  no  Rl  insertion,  or  carrying  a  Rl 
insertion  with  junction  fragments,  i  Total  gene  length  of  the  comM  target 
gene  detected  in  a  collection  of  finished  A.  baumannii  genomes. 
Additional  file  13:  Figure  S4.  A.  baumannii  whole  genome  SNP  tree.  A 
whole  genome  SNP  tree  was  constructed  for  249  A.  baumannii  genomes 
and  four  Acinetobacter  spp.  genomes.  Major  clonal  complexes  (CC1,  CC2, 
and  CC3),  ST  25,  and  US  hospital  isolates  forming  CC2  UH  clades  A-E  are 
highlighted  with  a  colored  background  (see  key)  [40].  A  colored  box 
following  the  strain  name  marks  the  50  isolates  sequenced  in  this  study 
{red)  and  the  finished  public  reference  genomes  {blue).  The  annotation 
table  {right)  summarizes  (i)  Rl  signatures  and  (ii)  virulence  factor  diversity 
reported  in  this  study.  See  main  text  for  a  more  detailed  description.  Briefly, 
(i)  Rl  insertions  were  examined  at  the  following  gene  loci:  comM,  pho,  astA, 
acetylT,  acylCS,  and  a  hypothetical  protein  (Additional  file  1 1).  Specific  Rl 


insertion  types  detected  at  individual  insertion  loci  were  reported.  A  colored 
cell  in  the  Rl  section  of  the  annotation  table  represents  the  presence  of  an 
Rl  feature  for  a  given  isolate.  For  example,  AbaR3  and  AbaR4  type  RIs  are 
found  at  comM  in  CC1  isolates,  whereas  AbGRII  type  RIs  instead  are  de¬ 
tected  at  comM  in  CC2  isolates.  "P"  was  used  to  indicate  that  the  Tn7548  Rl 
was  detected  on  a  plasmid  in  three  finished  genomes  instead  of  the 
acetylT  locus  located  on  the  chromosome  (this  study)  [79].  "#"  Rl  was 
previously  reported  at  the  astA  locus  but  not  in  this  study  [71],  "***", 
and  "5"  represent  extreme  antibiotic  resistance,  strong  resistance, 
and  susceptible  to  antibiotics  as  determined  in  this  study,  respectively 
(Additional  file  3).  The  black  triangle  indicates  a  branch  node  where  the 
loss  of  AbGRII  insertion  at  the  comM  locus  is  suspected.  Previously 
reported  Aba-type  RIs  are  listed  in  black  bold  to  the  right  of  the  annotation 
table,  (ii)  Virulence  factor  diversity  was  detected  as  specific  gain  or  loss  of 
gene  clusters  involved  in  type  I  pili  assembly,  siderophore  production,  or 
efflux.  In  general,  a  colored  cell  in  the  virulence  section  of  the  annotation 
table  represents  the  detection  of  a  genomic  variation,  which  in  most  cases 
indicates  the  loss  of  a  gene  cluster  in  a  given  isolate;  the  only  exception  is 
for  siderophore  cluster  2  in  which  a  colored  cell  represents  the  specific  gain 
of  the  gene  cluster  in  the  isolate.  For  example,  type  I  pilus  cluster  1  {csu) 
was  lost  in  two  ST  10  isolates,  and  also  a  subset  of  ST  2  isolates  including 
the  entire  group  of  nine  UH  clade  B  strains  and  also  the  multi-drug  resistant 
strain  MDR-ZJ06.  In  contrast,  across  all  249  isolates  analyzed,  siderophore 
cluster  2  was  only  detected  and  thus  specifically  gained  in  ATCC  17978,  and 
also  six  isolates  belonging  to  strain  types  ST  81  and  ST  94.  A  SNP  matrix  file 
containing  1 50,000  genomic  variants  for  SNP  tree  construction  is 
provided  as  Additional  file  14. 

Additional  file  14:  Text  file  of  SNP  variants  identified  across  150,000 
genomic  positions  (ACICU  reference  coordinates)  and  249  A.  baumannii 
isolates  analyzed.  This  file  was  compressed  using  tar  and  7-Zip  (a). 

Additional  file  15:  Figure  S5.  Novel  composite  IS26  Rl  inserted  into  the 
acylCS  gene  locus.  ACICU  was  used  as  a  reference  to  show  that  an  18  kb 
composite  IS26  Rl  replaced  the  original  8  kb  genomic  region  in  Rl-positive 
isolates.  The  composite  Rl  contains  two  resistance  gene  cassettes.  The 
oldest  isolate  was  NIPH  1669  (1997),  which  only  carried  the  5' fragment 
of  the  composite  IS26  Rl  including  one  resistance  gene  cassette.  Pairwise 
nucleotide  identity  shown  in  a  red  to  blue  (100  %  identity)  color  scale.  Key: 
ORFs  {arrows)-,  drug  resistance  genes  located  on  Rl  {green);  deleted  genes 
{gray);  immediate  Rl  flanking  genes  acylCS  and  a  predicted  exporter  {dark 
orange );  other  flanking  genes  {orange). 

Additional  file  16:  Figure  S6.  Novel  genomic  fragment  inserted  into 
the  acetylT  gene  locus.  AB307-0294  was  used  as  a  reference  to  show  the 
Rl  and  non-RI  type  insertions  found  at  the  acetylT  locus,  a  A  7.8  kb 
non-RI  fragment  was  detected  juxtaposed  to  the  acetylT  locus  across  all 
ST  25  isolates  analyzed.  The  annotated  salicylate  monooxygenase  gene 
located  on  the  genomic  fragment  could  be  involved  in  catechol 
production,  b  A  Tn7548  Rl  insertion  was  detected  at  the  acetylT  locus  in 
other  isolates  (e.g.,  multi-drug  resistance  MDR-TJ  isolate).  Pairwise  nucleotide 
identity  shown  in  a  red  to  blue  (100  %  identity)  color  scale.  Key:  open 
reading-frames  {thick  arrows );  drug  resistance  genes  located  on  Rl  {green );  Rl 
insertion  target  acetylT  {dark  green);  Rl  flanking  genes  {orange). 

Additional  file  17:  Table  S7.  Virulence  and  fitness  factors  used  in  this 
study. 

Additional  file  18:  Figure  S7.  Diversity  of  virulence  and  fitness  factors 
based  on  centroid-to-ortholog  BSR.  Genomic  regions  involved  in  assembly 
of  type  I  pili,  siderophore  production,  and  efflux  were  highly  variable  and 
showed  specific  gain  or  loss  of  the  entire  gene  clusters  in  isolates  analyzed. 
In  the  heat  map,  the  presence,  absence,  and  low  similarity  of  a  protein 
ortholog  compared  with  its  centroid  is  shown  in  yellow  (BSR  =  1),  blue 
(BSR  =  0),  and  gray,  respectively.  The  list  of  virulence  genes  analyzed 
and  the  BSR-derived  heat  map  file  are  provided  in  Additional  files  17 
and  1 9,  respectively. 

Additional  file  19:  Excel  file  containing  the  distance  matrix  of 
centroid-ortholog  pairs  based  on  BSR. 

Additional  file  20:  Table  S8.  Diversity  of  type  I  pili  gene  clusters  in 
isolates  analyzed. 

Additional  file  21:  Figure  S8.  Loss  of  type  I  pili  cluster  3  gene  cluster. 

A  complete  loss  of  the  type  I  pilus  cluster  3  was  observed  in  all  ST  2  isolates 
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(e.g.,  ACICU,  Naval-78,  NIPH  528)  and  additional  strain  types  including  ST  25 
(e.gv  Naval-1 8,  NIPH  1 46),  ST  79  (e.g.,  UH7907),  ST  1 1 3,  ST  21 5,  and  others. 
Pairwise  nucleotide  identity  shown  in  a  red  to  blue  (100  %  identity)  color 
scale.  Key:  open  reading-frames  {thick  arrows)-,  type  I  pilus  cluster  3  ( green ); 
genes  immediately  flanking  deletion  {dark  orange );  other  flanking  genes 
{orange). 

Additional  file  22:  Figure  S9.  Loss  of  siderophore  cluster  1  gene 
cluster.  A  complete  loss  of  siderophore  cluster  1  was  observed  in  six 
isolates  of  mixed  strain  types,  but  within  short  phylogenetic  distance  as 
shown  on  the  whole  genome  SNP  tree.  Pairwise  nucleotide  identity 
shown  in  a  red  to  blue  (100  %  identity)  color  scale.  Key:  open  reading -frames 
{thick  arrows );  siderophore  cluster  1  {green);  genes  immediately  flanking 
deletion  {dark  orange);  other  flanking  genes  {orange). 

Additional  file  23:  Command  line  arguments  used  for  running 
NCBI.  blastall,  panoct.pl,  paralog_matchtable.pl,  and  gene_order.pl. 

Additional  file  24:  Figure  S10.  Genomic  locations  of  RIs  and  virulence 
factors  analyzed  in  this  study.  The  ACICU  genome  was  used  as  a 
reference  backbone  to  show  genomic  features  analyzed  in  this  study. 
Note  that  not  all  features  were  detected  in  the  ACICU  genome. 

Additional  file  25:  The  JCVI-annotated  combined  amino  acid  fasta 
file  used  as  input  to  panoct.pl.  This  file  was  compressed  using  tar  and 
7-Zip  (a). 

Additional  file  26:  The  combined  genome  attribute  file  required  to 
run  panoct.pl,  which  contains  contig  identifiers,  protein  identifiers, 
coordinates  of  protein  coding  regions,  protein  annotation,  and 
genome  identifiers.  This  file  was  compressed  using  tar  and  7-Zip  (a). 

Additional  file  27:  A  minimal  set  of  output  files  generated  by 
panoct.pl  and  gene_order.pl.  Also  included  are  look-up  tables  to  cross 
reference  JCVI  internal  genome  identifiers  with  GenBank  identifiers  as 
well  as  to  cross-reference  JCVI  internal  locus  names  with  GenBank  locus 
names.  Note  that  many  more  output  files  were  generated,  which  are 
informative,  but  not  required  to  interpret  the  results  of  pan-genome 
analysis,  except  the  output  file  of  paralog_matchtable.pl,  which  was  too 
large  and  easily  made  from  the  provided  files.  This  file  was  compressed 
using  tar  and  7-Zip  (a). 
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Appendix  II.  Select  differentially  expressed  genes  in  A.  baumannii  confrontations.  LogFC  values  are 
only  shown  if  the  corresponding  FDR  was  <  0.05. 


MRSN  7339 

Locus  ID 

LogFC  of  A.  baumannii  confronted  with: 

Annotation 

E.  hormaechoi 

K.  pneumoniae 

C.  jeikeium 

Biofilm 

pgaD.2 

T634  1013 

0.81 

BfmS-two-component  system  sensor  kinase  protein 

T634  0792 

-0.56 

Type  1  pill 

Type  1  pili  cluster  2.4 

T634  1729 

0.79 

Type  1  pili  cluster  3.4 

T634  2348 

-0.86 

CsuA  -  type  1  pili  cluster  1 

T634  2492 

-3.05 

CsuA/B  -  type  1  pili  cluster  1 

T634  2493 

-3.09 

CsuB  -  type  1  pili  cluster  1 

T634  2491 

1.74 

-3.02 

CsuC  -  type  1  pili  cluster  1  -  chaperone 

T634  2490 

-2.45 

CsuD  -  type  1  pili  cluster  1  •  usher 

T634  2489 

-2.69 

CsuE  -  type  1  pili  cluster  1  *  tip  adhesion 

T634  2488 

-2.14 

Type  IV  pili 

pilB  -  type  IV  pili  cluster  1 

T634  0406 

1.19 

pilC  -  type  IV  pili  cluster  1 

T634  0405 

1.80 

pilT  -  type  IV  pili  cluster  2 

T634  0943 

1.18 

pilU  -  type  IV  pili  cluster  2 

T634  0942 

0.98 

comL  -  type  IV  pili  cluster  3 

T634  3583 

2.04 

comM  -  type  IV  pili  cluster  3 

T634  3586 

1.80 

comN  -  type  IV  pili  cluster  3 

T634  3585 

2.12 

comO  -  type  IV  pili  cluster  3 

T634  3584 

1.87 

comQ  -  type  IV  pili  cluster  3 

T634  3582 

1.94 

pilZ  -  type  IV  pili  cluster  4 

T634  1785 

0.74 

pilG  -  type  IV  pili  cluster  5 

T634  3239 

1.59 

pilH  -  type  IV  pili  cluster  5 

T634  3238 

1.14 

pill  -  type  IV  pili  cluster  5 

T634  3237 

1.34 

pilJ  -  type  IV  pili  cluster  5 

T634  3236 

0.76 

0.86 

2.41 

chpA-like  -  type  IV  pili  cluster  5 

T634  3235 

2.23 

A1S0232  -  type  IV  pili  cluster  6 

T634  0311 

0.69 

Iron 

siderophore  cluster  1.3  -  MFS  superfamily  MDR  protein 

T634  1851 

-0.65 

siderophore  cluster  1.11 

T634  1860 

1.02 

siderophore  cluster  3.1  -  acinetobactin-isochorismate  synthase 

T634  2721 

1.22 

-0.65 

siderophore  cluster  3.2  -  acinetobactin  -  Basl 

T634  2722 

1.49 

siderophore  cluster  3.3  -  acinetobactin  -  thioesterase 

T634  2723 

1.61 

siderophore  cluster  3.4  -  acinetobactin 

T634  2724 

1.66 

0.73 

-0.58 

siderophore  cluster  3.7  -  acinetobactin 

T634  2725 

1.72 

0.71 

-0.63 

siderophore  cluster  3.8  -  acinetobactin 

T634  2726 

1.79 

siderophore  cluster  3.9  •  acinetobactin-bifunctional  isochorismate  lyase  /  aryl  carrier  protein 

T634  2727 

1.65 

0.76 

-0.56 

siderophore  cluster  3.10  -  acinetobactin-enterobactin  synthase  subunit  E 

T634  2728 

1.55 

0.58 

siderophore  cluster  3.11  -  acinetobactin  -  BasE 

T634  2729 

1.73 

0.64 

-0.83 

siderophore  cluster  3.12  -  acinetobactin  -  BasD 

T634  2730 

1.92 

0.82 

-0.54 

siderophore  cluster  3.13  -  acinetobactin  -  BasC-acinetobactin  siderophore  biosynthesis  protein 

T634  2731 

1.73 

0.72 

-0.79 

siderophore  cluster  3.14  -  acinetobactin  -  BauA-ferric  acinetobactin  receptor 

T634  2732 

1.53 

0.78 

siderophore  cluster  3.15  -  acinetobactin  -  BauB-ferric  acinetobactin  binding  protein 

T634  2733 

1.48 

siderophore  cluster  3.16  -  acinetobactin  -  BauE-ferric  acinetobactin  transport  system  ATP-binding  protein 

T634  2734 

1.18 

siderophore  cluster  3.17  -  acinetobactin  -  BauC-femc  acinetobactin  transport  system  permease 

T634  2735 

1.21 

siderophore  cluster  3.18  -  acinetobactin  -  BauD-femc  acinetobactin  transport  system  permease 

T634_2736 

1.22 

siderophore  cluster  3.19  -  acinetobactin  -  BasB-non-ribosomal  peptide  synthase 

T6342738 

1.47 

0.69 

siderophore  cluster  3.20  -  acinetobactin  -  BasA-acinetobactin  biosynthesis  protein 

T6342739 

0.62 

siderophore  cluster  3.21  -  acinetobactin  -  BauF-acinetobactin  utilization  protein 

T634  2741 

0.64 

entA  -  siderophore  cluster  5  -  2,3-dihydro-2,3-dihydroxybenzoate  dehydrogenase 

T6341961 

1.00 

0.53 

TonB  dependent  ferrisiderophore  receptor  protein 

T634_2182 

0.64 

entB  -  siderophore  cluster  5  -  isochonsmatase 

T634  1962 

1.01 

0.67 

Heavy  metals 

acr3-arsenical  resistance  protein 

T6341670 

-1.18 

czcB/A.I  -cobalt-zinc-cadmium  resistance 

T6343607 

-0.75 

Efflux 

AdeA  -  efflux  cluster  1 

T6342009 

0.55 

AdeC  -  efflux  cluster  1 

T634_2007 

0.56 

abeM. 3-MATE  family  efflux  pump 

T634_3767 

-0.65 

adeT.2-RND  family  efflux  pump 

T6 34  0006 

-0.67 

-0.54 

K  antigen 

KL2.20 

T6340109 

-0.59 

KL2.21 

T634  0110 

-0.55 

beta-lactamase 

AmpC/ADC.1 

T634_2715 

0.55 

OXA-51.3 

T634  1739 

-0.79 

TEM-1 

T6340270 

0.88 

Cell-cell  interaction 

Type  VI  secretion  system.  1 1 

T6341371 

-2.13 

-2.14 

Type  VI  secretion  system.  12 

T634  1372 

-1.14  ~1 

Type  VI  secretion  system. 13 

T6341373 

-1.11 

hemin  utilization 

biopolymer  transport  protein  ExbD/ToIR 

T6341146 

■0.48 

Other 

class  A  beta-lactamase 

T6340314 

-0.52 

AMP-binding  enzyme 

T6340689 

1.40 

1.54 

transporter,  major  facilitator  family  protein 

T6340763 

1.12 

1.87 

hypothetical  protein 

T634~1278 

1.27 

cold  shock-like  protein  CspE 

T634J351 

1.42 

universal  stress  family  protein 

T634  1403 

-0.67 

ABC  transporter.  ATP-binding  protein 

T6341657 

1.41 

1.68 

taunne  dioxygenase 

T6341659 

1.10 

1.67 

hypothetical  protein 

T6 34  1935 

1.68 

universal  stress  family  protein 

T634  2216 

-0.94 

bacterial  transmembrane  pair  family  protein 

T634~2308 

1.11 

1.65 

hypothetical  protein 

T634_2339 

1.69 

lipocalin-like  protein 

T634  2472 

1.05 

beta-alinine-pyruvate  transaminase 

T634  2517 

1.17 

1.76 

0.84 

PF04328  family  protein 

T6342789 

0.74 

cartoon  starvation  protein  CstA 

T6 34  2790 

0.59 

CRISPR  type  l-F/YPEST-associated  protein  Csy3 

T6342844 

0.52 

0.56 

CRISPR-associated  endonuclease  Casl,  subtype  l-F/YPEST 

T634_2848 

0.56 

0.91 

senne  hydrolase 

T634_2916 

1.54 

phage  protein  F-like  protein 

T634  3147 

0.91 

DNA-binding  transcriptional  regulator  Cro 

T634J3172 

1.03 

hypothetical  protein 

T634  3179 

0.78 
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Appendix  III.  Select  differentially  expressed  genes  in  K.  pneumoniae  confrontations.  LogFC  values  are 


only  shown  if  the  corresponding  FDR  was  <  0.05. 


MRSN  1319 

Locus  ID 

LogFC  of  K.  pneumoniae  confronted  with: 

Annotation 

A.  baumannii 

E.  hormaechei 

C.  jeikeium 

Transporter 

ABC  transporter,  ATP-binding  protein 

T643  A0984 

-1.36 

ABC  transporter,  permease  protein 

T643  A2983 

-1 .40 

Cell  metabolism 

diacetyl  reductase  ((S)-acetoin  forming) 

T643  A2465 

0.92 

cytochrome  d  ubiqumol  oxidase,  subunit  II 

T643  A1048 

-1 .07 

-0.73 

spermidine  export  protein  Mdtl 

T643  A1929 

-0.71 

pyruvate,  water  dikinase 

T643  A2572 

•1.19 

hydroxylamine  reductase 

T643  A1263 

0.85 

Iron 

ferrienterobactin  receptor 

T643  A0904 

-2.73 

-0.95 

Transcnption 

DNA  gyrase,  A  subunit 

T643  A3091 

-1.00 

HTH-type  transcriptional  regulator  TdcA 

T643  A2708 

1.52 

Other 

PF07256  family  protein 

T643  A0 102 

2.20 

PF09209  domain  protein 

T643  All 31 

1.38 

conserved  hypothetical  protein 

T643A0625 

0.84 

biofilm  formation  regulatory  protein  BssS 

T643J\0534 

1.09 

transcnptional  regulator,  LuxR  family 

T643_A0769 

1.45 

putative  PQQ-like  domain  protein 

T643J\0825 

2.02 

-1.74 

plasmid  stabilization  system  protein,  RelE/ParE  family 

T643_A0921 

1.13 

universal  stress  family  protein 

T643_A0955 

0.82 

universal  stress  family  protein 

T643_A1096 

0.81 

putative  multiple  stress  resistance  protein  BhsA 

T643_A1141 

1.22 

universal  stress  family  protein 

T643_A1798 

0.68 

universal  stress  family  protein 

T643_A1952 

1.62 

multiple  antibiotic  resistance  protein  MarR 

T643_A1998 

0.81 

coenzyme  PQQ  biosynthesis  protein  A 

T643~A2194 

1.52 

universal  stress  family  protein 

T643~A2817 

0.91 

O-antigen  biosynthesis  protein  WbnF 

T643_A2936 

0.68 

transcnptional  regulator,  LuxR  family 

T643_A3863 

1.14 

putative  multiple  stress  resistance  protein  BhsA 

T643~A4136 

1.81 

universal  stress  protein  B 

T643  A4389 

1.06 

quaternary  ammonium  compound-resistance  protein  SugE 

T643~A5092 

1.08 

toxin  RelE 

T643_A5190 

0.94 

antirestriction  protein  ArdA 

T643_A5522 

2.21 
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Appendix  IV.  Bibliography  of  all  publications  and  meeting  abstracts  (reverse  chronological  order). 


Chan  AP,  Sutton  G,  DePew  J,  Krishnakumar  R,  Choi  Y,  Huang  XZ,  Beck  E,  Harkins  DM,  Kim  M,  Lesho  EP, 
Nikolich,  M.  P.,  and  Fouts,  D.  E.  A  novel  method  of  consensus  pan-chromosome  assembly  and  large-scale 
comparative  analysis  reveal  the  highly  flexible  pan-genome  of  Acinetobacter  baumannii.  Genome  Biol 
2015,16:143. 

Chan,  A.  P.,  DePew,  J.,  Choi,  Y.,  and  Fouts,  D.  E.  “An  Informatics  Approach  to  Identify  A.  baumannii 
Resistance  Island  Signatures”  A  poster  presented  at  the  Military  Health  System  Research  Symposium,  Fort 
Lauderdale,  FL,  August  18-21,  2014. 

Oliver-Kozup,  H.  A.,  DePew,  J.,  Chan,  A.,  Fouts,  D.  E.  “Genomic  and  Transcriptomic  Characterization  of 
Multi-Drug  Resistant  Bacteria  Associated  with  Combat  Wound  Infections”  A  poster  presented  at  the  2nd  Annual 
Host  Pathogen  Interactions  in  Biodefense  and  Emerging  Infectious  Diseases  Conference,  George  Mason 
University,  Manassas,  VA,  November  21,  2013. 
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Dr.  Yongwook  Choi,  Ph.D. 

Dr.  Radha  Krishnakumar,  Ph.D. 

Dr.  Heaven  Oliver-Kozup,  Ph.D. 

Jessica  DePew 

Maria  Kim 

Derek  Harkins 
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Appendix  VI.  Quad  Chart. 


Characterization  and  Expression  of  Drug  Resistance  Genes  in  MDRO*s  Originating 
from  Combat  Wound  Infections 
DM110352  Final  Report 
W81 XWH-1 2-2-0 1 06 

PI:  Dr.  Derrick  E.  Fouts,  Ph.D.  &  Dr.  Agnes  Chan,  Ph.D.  Org:  J.  Craig  Venter  Institute  Award  Amount:  $  968,624.00 


Study/Product  Aim(s) 

•  Aim  1:  Characterize  the  regulatory  gene  network  of  MDROs 
when  challenged  in  co-culture  with  other  bacteria. 

•  Aim  2:  To  understand  the  interaction  of  MDROs  with  human  cells 
in  a  cell  culture  model. 


Approach 

This  study  uses  advanced  genomic  and  bioinformatics  approaches 
coupled  to  cell  culturing  to  identify  and  characterize  factors  that 
influence  virulence  (including  antibiotic  resistance,  and  biofilm 
formation)  in  multidrug-resistant  organisms  (MDROs).  Deep 
next-generation  cDNA  sequencing  is  being  used  as  a  tool  to 
dissect  the  regulatory  networks  altered  during  these 
organismal-level  confrontations.  Cell-cell  challenges  include: 
MDROs  versus  human  microbiome  organisms,  MDROs  versus 
other  MDROs  and  MDROs  versus  the  human  host  via  cell 
culture. 


New  treatments 


Multidrug-resistant  Organism 
Repository  and  Surveillance 
Network  (MRSN) 


Identify  and  characterize 
High  priority  MDROs 


US  ARMY  MEDICINE 


Shareinfo.  J.  Craig  Venter ' 

INSTITUTE 

-  Characterize  drug  resistance  mechanisms 

-  Determine  bacterial-bacterial  interactions 

-  Understand  bacterial-Eukaryotic  interactions 


Accomplishments:  1)  Manuscript  published  on  pan-genome  comparison  of  A.  baumannii 
in  Genome  Biology;  2)  Bacterial-Bacterial  co-culture  experiments  are  completed. 

Statistical  analysis  of  RNA-seq  data  was  performed,  biological  interpretation  and 
manuscript  were  initiated;  3)  MDRO:Human  HDF  cell  confrontations  experiments  are 
completed.  Boinformatic  analysis  of  MDRO:Human  HDF  cell  confrontation  data  has  been 
completed. 


Timeline  and  Cost 


Activities 

CY 

12 

13 

14+ 

T 

1.  Transcriptome  profiling  of  MDRO- 
bacteria  interaction 

2.  Confrontation  assays  to  monitor 
interaction  of  MDROs  with  human 
cells 

Estimated  Budget  ($968K) 

$162K 

$485K 

$321  K 

Updated:  September  9,  2016 


Goals/Milestones 

CY1 2/1 3  Goals  -  Sequence  25  MDROs  from  MRSN  &  perform 
bacterial-bacterial  confrontation  assays 
0MDRO  bacterial  isolate  selection 

0Genomic  library  construction  and  whole  genome  sequencing 

0Sequence  assembly  and  mapping 

0Genome  annotationand  analysis 

0B  a  cte  rial-bacteria  I  confrontation  assay 

0Bacterial  RNA-Seq  and  gene  expression  analysis 

CY1 3/14+  Goal  -  Perform  bacterial-eukaryotic  confrontation  assays 

0Primary  human  skin  cell  culture 

0B  a  cte  rial-eukaryotic  confrontation  assay 

0Deposit  raw  sequence  data  from  confrontation  assays  in  NCBI  SRA 
0B  a  cte  rial-host  RNA-Seq  and  gene  expression  analysis 
Comments/Challenges/lssues/Concerns 
•  Administrative  delays  noted  in  technical  report. 

Budget  Expenditure  to  Date 
Projected  Expenditure:  $968K 
Actual  Expenditure:  $968K 
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